Wednesday, March 09, 2011

Truth That Hurts

Edsger W. Dijkstra
You're doing it wrong.
Edsger W. Dijkstra wrote a very short classic paper, How do we tell truths that might hurt? in 1975. In it, he lists a number of things that he imagined computer scientists of the day thought were true "without hesitation". They're great sound bites. That is, they have no context. They have no explanation. But they're delivered by someone (in this case Dijkstra) in authority. For the record, though Dijkstra was amazing, he said a number of things that are no longer true, and/or were overly general.

For example, at the time, languages such as machine language, FORTran or BASIC, one of the primary control structures available was the "GO TO" command. There was no real choice. You had to use it frequently. And it could be very difficult to follow the flow of a program when there were lots of them. You'd try to follow where things were going, but it was like following spaghetti noodles. By the 1980s, block structured languages were abundantly available. That included C, Pascal, and even Fortran, with the 1977 standard. Oddly, one of Dijkstra's favorite languages, Algol 60, was on it's way out. Anyway, in 1968, he wrote an article against the Go To.

In the 80's, i had managers totally go crazy seeing even a single "go to" in my code. Now, while a single "go to" could in principal be difficult to follow, when it is clearly used to exit a doubly nested loop, there's no such issue. The spaghetti argument can't be made. How much of a rat's nest can you make if you are limited to one loop? My argument wasn't that he was wrong. My argument was that by using absolute authoritarian statements, he caused as many problems in the industry as he solved. The C language has the break statement to exit a loop. But one must use "goto" to get out of two nested loops. One of the typically ways to get the same effect is to introduce a state variable. This has two problems. First, state variables can be just as hard to follow, or even worse than spaghetti "go to"'s. Second, checking a run time variable increases code size and slows execution.

The same managers who balked at a single "go to", would scold me for daring to use recursion to solve a problem. Really. Recursion looks strange when you see it for the first time, but it's very powerful. If you're a programmer and don't know it, then it's something you must learn. Get out of your comfortable space and do some growth. Dijkstra would have approved. I haven't yet spotted a reference where Dijkstra was a big Lisp fan, but he was into proving program correctness, and Lisp was and is the language to do that in. You can do lisp without recursion, but Lisp lends itself to recursion so much that many programs use it instead of using loops, and it's quite natural.

Anyway, back to the paper. It's pretty funny. But in his paper, How do we tell truths that might hurt? he never answers the question. It's a rant. If you tell people that they're doing things wrong, you'll ruffle feathers. After all, Galileo was a giant - developing the telescope, discovering all sorts of interesting things with it. But he suffered house arrest by the Vatican for his arrogance. Yet, it doesn't seem to occur to Dijkstra that there's only a little difference between "You're doing it wrong", and "there is a better way to do it". And yet the difference makes all the difference. There is plenty of evidence that the Vatican was already cool with a Sun centered solar system. It wasn't the facts so much as the way they were delivered.

I'm currently learning COBOL - a language i've been avoiding for decades. It's not that it's difficult. It's a little clunky. But it can certainly get the job done. And in the late 70's, everyone claimed that it was on it's way out. But forty years later, it's still going strong. So, what did people know back then? Anyway, the paper starts by calling COBOL a disease to either fight or ignore. Hysterical. But, let's see how the other sound bites hold up.

Programming is one of the most difficult branches of applied mathematics; the poorer mathematicians had better remain pure mathematicians. I agree that programming is difficult. I have an engineering degree. When you design a car, you reuse the same bolt design over and over. In programming, if you're doing the same thing again, you make it a subroutine and call it twice. Ideally, there's no repetition, no repeated parts. And each of those parts works with, reacts, and counter reacts, at least potentially, with every other part in the system. So the complexity goes up faster in programming. Now, only one in four students who start an engineering degree graduate. And, you have to be really sharp to get enrolled. Back in 1975, programmers came from the ranks of mathematicians. Today, it's its own discipline. It's not that mathematicians were poor programmers. Programming requires unique skills. And not everyone gets it. Not all programmers with a Computer Science degree today will meet Dijkra's standards. The standard of competence today is that the programs created work. Dijkstra also required elegance, which leads to mainainability, sometimes performance, in addition to a sort of art appreciation.

The easiest machine applications are the technical/scientific computations. This was likey true at the time. Computers are good at math. If you need to figure out how much the beam will bend under load, it's a pretty easy program to write. But these days, the simulations you need to do for a car crash aren't exactly easy. It's just that computers in 1975 weren't up to the task. I'd say that business applications are the easiest. That's not to say that they're trivial.

The tools we use have a profound (and devious!) influence on our thinking habits, and, therefore, on our thinking abilities. The language you learn, be it English or whatever, has cultural prejudices embedded. And, solving a problem in Lisp will almost always lead to a very different approach than solving it in C. Having done both, my initial guess, that one language would be better than the other at some tasks, has been validated. So with any two languages, each will have it's strengths. I wrote both solutions, using the available language features and styles. So it's not necessarily thinking habits. Habits can be broken. Prejudices can be fought. The process is the same. Think about everything. Don't take anything for granted. Anything less is lazy. So, Dijkstra is right, but not in any absolute sense. In particular, we should not all switch to Lisp, even if that is the language where it's easiest to "prove" the correctness of our programs.

FORTRAN --"the infantile disorder"--, by now nearly 20 years old, is hopelessly inadequate for whatever computer application you have in mind today: it is now too clumsy, too risky, and too expensive to use. FORTran, about 50 years old, has evolved. One of the quotes going around in the 80's was I don't know what language we'll be using in the year 2000, but it will be called FORTran. It's not my favorite language. Last i used it, it was difficult (but not strictly impossible) to write code that manipulated text symbolically. It was great for math computations, but not great at symbolic math. When i got to use C, which is pretty good at math, pretty good with text, and OK with symbols, i pretty much only used FORTran if i had to. But it's still in use today. It's just not as clumsy.

PL/I --"the fatal disease"-- belongs more to the problem set than to the solution set. PL/I was an IBM language. I wrote one of my very first programs in a subset of PL/I, called PL/C. I was maybe 12 or 13. It was block structured. It seemed OK. But i didn't stress the language, so i have no idea what Dijkstra might have been on about. It couldn't have been as bad at the time as BASIC, from Dijkstra's perspective. Last i heard, PL/I was in use at IBM internally only. It may have been abandoned by now.

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. Well, BASIC seemed to encourage GO TOs. All lines were numbered, and these numbers could be used as labels for GO TO. Numbered lines made editing easier. But BASIC had for loops. And these loops added some block structure. Another issue was that all variables had global scope. That limited the size of programs by making the complexity needlessly high. Later versions of BASIC fixed these issues by adding real block structure, local variables for subroutines, and even recursion.

The real problem with BASIC wasn't that it damaged anyone. It was that the tools it provided were good enough. And programmers that learned it, tended to write code in other languages as if those other languages were BASIC. It was comfortable. But Bruce Lee had it right: “There are no limits. There are plateaus, and you must not stay there; you must go beyond them. If it kills you, it kills you.” Bruce studied martial arts. It's hard to imagine that learning Lisp will kill you. And, Lisp might have been what Dijkstra was thinking about as an alternative. It gave you block structure, recursion, complex data structures, and things that are difficult to imagine if all you've seen before is BASIC. The Lisp/Algol course i took in school gave us five weeks for Lisp and two weeks for Algol. Algol is more similar to BASIC. Five weeks was not nearly enough for Lisp. Two weeks was like luxury for Algol.

The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence. I've just started with COBOL in detail. I suppose that Dijkstra would have considered this, like suicide, to have the perpetrator and the victim be the same person. So he'd likely consider it an unpardonable sin. Maybe i'd get special dispensation for learning it as my fiftieth language. But from what i've seen of COBOL, all of the complaints that Dijkstra had for BASIC and FORTran apply to COBOL. There may be more complaints. Perhaps the idea that COBOL is so verbose is one of them.

APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums. I haven't seen APL (A Programing Language) in decades. Perhaps it's dead now. It had some warts. It required a non-ASCII character set that included the entire Greek alphabet. That made it difficult to use in an era when most computer terminals could only display UPPER CASE. APL used single greek letters for built-in function names. And, one would string together dozens of these letters together without spaces to form a new function. Calling it hard to read is an understatement. I thought of it as a write-only language. However, it was terse. Whole programs were often a single line, and a short one at that. And, terse can be good.

The problems of business administration in general and data base management in particular are much too difficult for people that think in IBMerese, compounded with sloppy English. I believe Dijkstra is talking about COBOL again, though maybe with JCL - the other language i'm learning currently. I don't think he was talking about SQL - the Standard Query Language, used by most databases today, invented at IBM. I have issues with SQL, but i'll save that for some other rant. Obviously, business administration has worked using COBOL. I'll grant that other languages would have been OK too, possibly better. But the computer scientists of the 70's turned out to be wrong on many, many fronts, when predicting the future. Their logic was sound, generally. It was the assumptions that were mostly wrong. Can't fault them too much for that. Things happened in computers in the past 30 years that would have been hard to believe. So they probably wouldn't have given much thought to the ramifications. But memory is more than a million times larger. Disk storage is a million times larger. Everything is 10,000 to a million times faster. Everything is cheaper. The rules and goals have changed.

About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. I agree with him here. Well, he was vague enough that he could actually get away with being general.

Besides a mathematical inclination, an exceptionally good mastery of one's native tongue is the most vital asset of a competent programmer. I'd agree if i thought that mathematical inclination was very important for computer programming. Native language skill is paramount, to be sure. But i know people who are very good at math that absolutely do not grasp programming concepts in any practical way. I was pretty good at math, but i don't see that it helped me overly much. Skills are skills. Is there one skill that everyone will find more difficult to master than every other? I doubt it.

Many companies that have made themselves dependent on IBM-equipment (and in doing so have sold their soul to the devil) will collapse under the sheer weight of the unmastered complexity of their data processing systems. This simply hasn't turned out to be the case. Perhaps all the big companies have made themselves dependent on IBM or MicroSoft, and so are on equal footing with each other. Digital Equipment, which was a clear alternative to IBM in the 70's, is gone. IBm is still with us. Of course Digital's demise didn't necessarily have anything to do with the quality of their technology, for good or ill.

Simplicity is prerequisite for reliability. What is a prerequisite for reliability is program correctness and maintainability. Generally, these are achieved through simplicity. But i've written programs that i could maintain, but had much difficulty in explaining. They were as simple as i could make them, but that didn't turn out to be very simple. The requirements demanded a certain minimum level of complexity. Yet, these programs had long lifetimes. They were configurable by anyone, and that was pretty much the only maintenance required.

We can found no scientific discipline, nor a hearty profession on the technical mistakes of the Department of Defense and, mainly, one computer manufacturer. Obviously, it was done. I agree that it was a bad idea. And, the industry has managed to move away from that model to some extent. Unix, and the open source movement have had an incredible effect on the industry.

The use of anthropomorphic terminology when dealing with computing systems is a symptom of professional immaturity. I'm not entirely sure where Dijkstra is coming from with this. It's certainly a mistake to talk about computers as "thinking", at least at the moment. Take the example of the chess player. Humans and computers do use some of the same logic. But they don't get at strategy from the same perspective. Humans do better at recognizing situations as similar to historic situations, and work with classes of problems. Computers tend to work at the tactical level so deeply that strategy emerges. So it's different. And that's just one example. It gets worse with more complicated problems. But maybe he meant this: So, there's a bug, and because the computer saw this datum, it came to this erroneous conclusion. It happens. And Dijkstra may call it immature because that's what parents do with their infants, when no such claim could possibly be scientifically made.

By claiming that they can contribute to software engineering, the soft scientists make themselves even more ridiculous. (Not less dangerous, alas!) In spite of its name, software engineering requires (cruelly) hard science for its support. If by soft sciences, Dijkstra includes philosophy or psychology, then i agree. Otherwise, i've no idea what he's talking about.

In the good old days physicists repeated each other's experiments, just to be sure. Today they stick to FORTRAN, so that they can share each other's programs, bugs included. In the 1980's, when C++ was gaining momentum, the claim was that you'd write a good class (set of methods combined with a data representation), and it would simply be reused. You were done with that problem. And the joke was, "Now we can reuse all of our mistakes". My good friend Karl lamented that he was rewritting his subroutine library (the equivelent of a class) for the third time, so solve some issue. I thought version two was pretty damned good. So i told him that "all software
needs to be rewritten". (And in this sense, Dijkstra was right, you are doing it wrong. It can always be better in some sense.) But in the open source arena, you can use the existing code to stand on, and fix it if it's broken. And, you can contribute your fixes, so that when the next version comes out, it has your fixes, but also everyone else's. And there's open competition to have the best code to steal. Physicists still repeat each other's experiments, when at all practical. Otherwise, they examine each other's data.

Projects promoting programming in "natural language" are intrinsically doomed to fail. I think, with the victory of IBM's Watson in the game Jeopardy, natural language is within the grasp of computers in the near future. But Dijkstra was likely talking about COBOL, which was touted as readable by (non-technical) mangers.

PS. If the conjecture "You would rather that I had not disturbed you by sending you this." is correct, you may add it to the list of uncomfortable truths. The current most disturbing thing i've heard is from Carl Sagan's 1980 Cosmos series. He talks about how carbon dioxide has formed a greenhouse on Venus that keeps the surface temperatures hot enough to melt lead. Then he goes on to say that we are engaged in a similar, but uncontrolled experiment with our own atmosphere. Back in 1980, there wasn't much talk about it. We're talking about it now. Sagan didn't say anything in his series that wasn't already solid science. That's why so much of the series is still so relevant, 30 years later. What's most disturbing is that many are still in denial about climate change.

If Dijkstra wanted computer engineering to be practiced by an elite, then perhaps politics, on which our shared planetary environment depends, should be dictated by the best scientists. Just because industries of all sorts have survived with mediocre computer programming, doesn't mean that the Earth will make it. Consider that moving to Mars isn't a solution. Moving to Antarctica is much easier. You don't need to manufacture your own air.

No comments: