Binary babel: The programming language problem

Binary babel: The programming language problem

cobol Fortran lisp a lgol rust simula bcpl Code ada 36 | NewScientist | 8 June 2013 red C++ B c pascal go modula python Modern softw...

2MB Sizes 4 Downloads 105 Views

cobol

Fortran

lisp

a lgol

rust

simula

bcpl

Code

ada 36 | NewScientist | 8 June 2013

red

C++

B

c

pascal

go

modula

python

Modern software is a bug infested swamp, says Michael Brooks. Luckily there are paths to a better world

C#

I

’VE been using computers for decades now. It’s probably time I taught myself how to program one, but first I have to find the answer to a simple question. There are thousands of programming languages out there – which one should I learn? Fortunately, the internet has lots of answers: programming blogs and forums are filled with people asking that very question. Unfortunately, those answers are often less than helpful. The Java language “sucks”, apparently, and “all Java programmers are morons”. C++ is “baroque and ugly”. Critics of Ruby are more plain-spoken; to them, this language is simply “a piece of shit”. To the uninitiated the bile is a little bit frightening.

C--

internet c++

java

javascript

j script

ruby

“It’s just amazing how tribal people get with their programming languages,” says Crista Lopes, a professor in the informatics department at the University of California, Irvine. But beneath all this bilious tribalism lurks a home truth: most of the languages are poorly designed, making programming incredibly abstract and difficult. And they require extensive patching every time a new web application appears. Though computer code runs the modern world, it’s actually a shambles. You might argue that as long as your machine runs Angry Birds and Facebook, why bother about the online ramblings of angry herds. But that would be naive. The writhing mess that is computer programming in the 21st century is everyone’s business. It goes beyond losing all your work when Windows shows you the Blue Screen of Death. As more and more of the world is digitised, software bugs carry ever bigger consequences, from the highly inconvenient to the undeniably tragic. They ground planes and trigger financial meltdowns. They have even been known to kill when they pop up in healthcare equipment. Fortunately, things may be about to change. Let’s start with the obvious question: why are there so many languages? Ask this question on the internet and your first answer will be as snarky as it is rhetorical: “Why are there so many kinds of bicycle? Why are there so many different wrenches?” The reason, you will learn, is that different machines speak different languages. Your printer and an F-22 fighter jet don’t respond to the same commands, probably for good reason. However, in reality it’s much more complicated than that, even though all of these languages do basically the same thing: move strings of binary digits – bits – between the circuits of a microprocessor. Until the 1950s, computer programmers did this by operating levers and switches, sometimes with punch cards. Before they could do that, however, they would first need to carefully translate what they wanted the machine to do – in other words, the program – into the correct string of 0s and 1s that a computer could understand. This so-called “machine code” would tell the computer how to correctly assemble the logic in its bowels. If this sounds complicated, abstract and difficult, that’s because it was. Computing was time-consuming and expensive, and the programs unsophisticated. It’s not surprising that only a tiny number of people knew how to speak this machine code. Then along came Fortran. Invented by researchers at IBM’s Watson Laboratory, this “high-level” language made programming > 8 June 2013 | NewScientist | 37

more intuitive, helping more people program and do so with fewer mistakes. According to the original launch report, Fortran would “virtually eliminate coding and debugging” by replacing the opaque 0s and 1s with simple text commands such as “READ”, “WRITE” and “GO TO”. A “compiler” specific to each machine would take care of the translation, converting the programmer’s textual statements into something a computer could understand. This allowed the programmers to think in their own language rather than tailoring their thoughts to a particular set of microchips.

Nearly 60 years later, there has been a kind of Cambrian explosion in languages – to the detriment of comprehension. READ and WRITE have evolved into ever less intuitive and more opaque commands reflecting more complex logic. Move from 0s and 1s to words, and people quickly become dissatisfied with the limitations those words place on their ability to express themselves. Mark Pagel, who studies linguistic evolution at the University of Reading, UK, says humans – and the way they translate their thoughts into instructions for computers – are so varied that no one language is sufficient. As a consequence, there are probably 3000 or 4000 high-level languages around today, says Alex Payne, a former Twitter engineer and curator of the annual Emerging Languages Conference, and many of them work in tandem. Facebook alone uses C++, Java, PHP, Perl, Python and Erlang among others. The expansion shows no signs of slowing. A year ago, for example, Google released its Dart language as a replacement for JavaScript. According to a leaked 2010 Google email, “Javascript has fundamental flaws that cannot be fixed merely by evolving the language”. And yet, the variety is misleading. Despite the linguistic divergence, from a functional point of view most languages are still just thinly veiled versions of Fortran. “There’s not a big difference from where programming was in the 1960s,” Lopes says. Bret Victor, who used to design interfaces for Apple and now works freelance, agrees. “Python, Ruby, Javascript, Java, C++ – they’re all the same language,” he says. “They’re different dialects of fundamentally the same way of speaking.” Jonathan Edwards at the Massachusetts Institute of Technology goes further. “We never actually throw anything away, we never raze to the ground, we simply build a new storey on top of it,” he says. “We have 38 | NewScientist | 8 June 2013

Brendan McDermid/Reuters

Language explosion

these enormously tall towers reaching down to ancient technology.” And being stuck with so many pidgin versions of the early programming languages is a big headache. The first problem is that they are hard to write – “inhumanly hard,” as Edwards puts it. They are largely composed of patches and layers that provide cheap, quick fixes to bugs or jerry-rigged functionality. As the languages have grown more complicated, the programs have grown longer. The on-board systems of a Boeing 787 Dreamliner chew through about 6.5 million lines of code; in a 2010 Mercedes S Class, that number jumps to 20 million. Even a program as comparatively simple as Microsoft Word is estimated to run several million lines. And that’s where the real trouble sets in. Most programs are full of tiny grammatical mistakes – unintended indentations, a stray bracket, a comma in the wrong place and so on. According to a widely cited estimate there are 15 to 50 errors per 1000 lines of delivered code. These minuscule errors can render entire programs useless. Did you introduce one? You won’t know for

”This stuff is so complicated that we’re constantly making mistakes – often disastrous mistakes”

sure until you allow the computer to compile and run the code. But the compiler can’t tell you if your code works until after you have written the entire program. Manual debugging would be a fool’s errand. Just counting all 20 million lines of code in that Mercedes would take you the better part of a year – and that’s if you forgo meals, sleep and bathroom breaks. “This stuff is so complicated that we’re constantly making mistakes – often disastrous mistakes,” Edwards says. “We go back and fix them but the results are often very unsatisfactory.” Unsatisfactory barely covers it – the consequences can be fatal: a 2010 investigation by The New York Times uncovered software and programming errors caused death and serious injury in a slew of radiation therapy programmes. Programming errors have wrecked multimillion-dollar transnational projects – remember the 1996 test launch of the European Space Agency’s Ariane 5 rocket? A bug in the control software, written in the language Ada, caused the rocket to selfdestruct 37 seconds after blast-off. In 2010, a computer glitch cost investment firm Knight Capital half a billion dollars in half an hour. Thankfully, most of us won’t experience such catastrophes, but the simple truth is that, as more of our lives become digitised, software

A small bug in a financial algorithm nearly took down a global company

bugs will worm their way into every corner of our lives. “Every major industry is grappling with the enormous and unsolved challenges of software dependability, design and productivity,” says Kevin Sullivan at the University of Virginia in Charlottesville.

Software crisis In April, a software error meant that American Airlines had to ground their entire US fleet. Banking software has already failed many people in the UK, leaving them without access to their money for days on end, derailing house purchases and other major life events. Tomorrow’s driverless cars will run on code, but will they be any more bulletproof than the cars that had to be recalled by General Motors and Toyota due to software errors? “It’s sometimes called the software crisis,” Edwards says, “but that term was coined in the early 60s and it’s not clear we’ve made any progress since then.” But what can be done? One seemingly obvious solution might be to build a single universal master language from scratch. In the 1960s and 70s there was a concerted effort to do just that. Academics, industry experts and scientists from all over the world converged on a venue, then tried to thrash out the best way to standardise programming. “The transcripts of those meetings are hilarious – people got

into really big fights,” Payne says. They also got nowhere, as the panoply of languages today attests. Pagel draws parallels with Esperanto, the prototype universal human language. The reason it has never caught on, he says, is that it simply isn’t needed. A universal programming language is not only unnecessary, it might even be an impediment. “I like lots of languages,” Lopes says. “I find it hard to think about being stuck with just one.” Whether a universal programming language is desirable or even workable is still up for grabs. However, salvation may be at hand in a nascent endeavour in computer science: user-friendly languages that rethink the compiler. Yes, these are yet more languages teetering atop a 60-year-old tower of Binary Babel. But what makes them different is that their designers are building them to allow programmers to see, in real time, exactly what they are constructing as they write their code. Bizarrely, the outcome might look rather familiar. Edwards is aiming to reproduce something like Microsoft Excel. “It’s a programming language,” he says. In fact, according to Payne, it is the most widely used programming language of all. You plug in some numbers and tell the computer what to do with those numbers, depending on the row and column they are in. “You can do sophisticated computation without even realising that the formulas you enter are a form of programming,” says Timothy Lethbridge at the University of Ottowa. This realisation led Edwards to a simple question: why shouldn’t building web applications be as easy as using a spreadsheet? He is now building a new language called Subtext, written by copying and pasting existing routines and making small alterations that change the outcome. The different routines are nested together in ways that allow you to see their relative position within the program. The result is a visual representation of what the program does. While Subtext is still a “thought experiment in language design” and “just a vignette of what such a language would look like”, Edwards thinks it is a good first step. Like Edwards, Victor is also pioneering a new way to “let people see what they’re doing”. In his scheme, a programmer can see intermediate steps in a program and check what happens in real time as they tinker with the code. These approaches have inspired programmer Chris Granger to come up with another innovation, called Light Table. This allows a coder to work as if everything they need is in front of them on a desk, all easy

to grab and move around. Assembling the elements of a program creates instantaneous results, and the results of any changes to the code – good or bad – are immediately obvious. But the most radical option might be to let the program write itself, based on what you want it to do – entirely bypassing your linguistic peccadilloes. In some ways, that’s already happening. These days, choosing a language for a task involves considering what pre-written modules and libraries you can pull off the web. “The first thing people do is a Google search to see if anyone’s done it before,” Lopes says. “Even experts like me do it: professional software developers use Google liberally.” Then, when they get stuck, they turn once more to Google: over the past five years, rich bases of accumulated knowledge have sprung up all over the web. “In the 60s it was really hard to get unstuck,” she says. “You had to

”The best option might be to let the program write itself, bypassing your linguistic peccadilloes” read through manuals or call the experts. What took a week now takes 2 minutes.” So why not take advantage of that, and build a language composed entirely out of questions? Let computers develop their own languages and programs in response to the objectives you give them. Lopes’s dream is to create exactly that system to build a program from a sequence of internet search queries. “It is a dream, but lots of it is already here,” she says. Granted, a better compiler alone won’t be the end of software glitches. You would also need to address manufacturers’ never-ending quest to entice consumers to upgrade by bloating their products with unnecessary extra features, or maybe reduce the pressure to rush products to market at the expense of quality. That’s why none of the reformers are under any illusions that the industry will push aside C++ and Java and switch to a language based on Excel or a Google search. “What we have right now works – if you’re willing to put effort into it,” Edwards admits. But that doesn’t mean developing these new languages is a pointless exercise. “Wouldn’t it be great if we could open this up to normal, average people, not just the savants?” There’s hope for me yet. n Michael Brooks is a writer and New Scientist consultant based in Sussex, UK 8 June 2013 | NewScientist | 39