Tuesday, March 29, 2011

Top Ten, 20 of 10

We last left our hero in the top ten list of ways to be Screwed by "C". Today's entry is 64 Bit Madness.

No example is given, just a general warning about how integers are often signed by default, and therefore do not always behave as expected when moving from 32 bit architectures to 64 bit architectures. Since i learned C on the PDP-11, i'm predisastered. The PDP-11 is a 16 bit architecture and commonly used 16 bit integers. C's long type was 32 bits, but was not generally used for pointer arithmetic on this system. The Vax 11/780 was a commonly accessible 32 bit computer that was contemporarily available when i learned C. So, there was a considerable amount of code that i ported from the 32 bit VAX to the 16 bit PDP-11. I had fewer issues moving from 16 bits to 32 bits than from 32 bits to 16.

The sign extension problem was not much of an issue. If there was a sign issue on the 16 bit PDP-11, it generally showed up right away. That's because using more than 2^15 = 32,768 bytes was common on these machines. These machines often had between a quarter of a megabyte of physical RAM and as much as four megabytes.

On 32 bit machines, until very recently, it wasn't very common to have more than 2 GB of RAM, which is what you need to run into sign extension problems. However, it's already very common to have more than 4 GB of RAM on 64 bit computers. I already have one with 8 GB.






Computercharshortintlonglong long
PDP-11 16 bits8161632n/a
32 bits816323264
64 bits816326464
64 bits816646464

Denis Ritchie's C compiler was nearly always used the the PDD-11. And the int type was explicitly stated to be the most natural size for the machine. For most computers, this is the width of the CPU registers. And, for most machines, this is also the width of the program counter so that general registers could be used to compute addresses. The gcc compiler introduced the long long data type. It was a way to allow the int type to remain flexible while allowing access to a new longer integer type, whose time had come. It also had the feature of not breaking the language. No new keywords were introduced. Old compilers did not break on new code (though they usually did not produce what was desired). But when 64 bit computers came out, there were two standards, often for the same hardware. Depending on the compiler, the int data type may be 32 or 64 bits in length. So, there are some C compilers which decided that the int data type was not a redundant optimization type but rather had it's own type, and further, that data type is not the same size as pointers on that machine, as was historically the case.

So while porting C code from the PDP-11 to the 32 bit VAX was generally easy, it can be awkward to port code from one compiler to another on the same 64 bit machine. There are workarounds.

Could Dennis have planned better when he designed his language? While 64 bit integers weren't much in demand on the PDP-11, 64 bit architectures existed in hardware since 1961. That's much older than the C language. It wouldn't have hurt much to add 64 bit support for the PDP-11. Indeed, Unix was ported to the 1976 Cray-1 computer architecture eventually. That's well before the roughly 1969 vintage C language.

Monday, March 28, 2011

Top Ten, 19 of 10

We last left our hero some time ago in the top ten list of ways to be Screwed by "C". I happened to be looking at the old posts, and noticed that the author has added a couple. Today's entry is Accidental Integers. The first example is:


int a = 2 && 4 && 8; // what is the value of "a" ?

The issue here is that integers are also Booleans in the C language. So this expression is expected to use the constant "2" as a Boolean, which is clearly not false and therefore true, perform logical and with "4", which is also true, and perform a logical and with "8", which is also true. The answer is true, and the variable a is set to 1. On read, integer values that are zero are false, and anything else is true. But if the language has to pick a non zero value, it uses 1. Since this example has several constants in an expression, the compiler would evaluate these at compile time and simply assign the value of 1 to a. The second example is a little curious. It's supposed to be the same.

int value = a && b && fn(a->x,b->x);

In this example a and b are structure pointers, and fn is a function which returns an integer. The author wants to check to see that the structure pointers aren't NULL before calling the function, which in C is usually zero. NULL is typically a zero cast to (void *). If you check the boolean truth value as in this code example, you are expecting the value to be compared with zero. But also, the && and operator is a shortcut operator. If the left hand side of the operator is false, then the right hand operator is not evaluated. So, the idea in this code fragment is if pointer a isn't NULL then check if pointer b isn't NULL, and if it isn't, call function fx and set value to the truth value of the integer returned by fn. One suspects that the unexpected result is how the return value is assigned. So this is probably what the author wanted:

int value; /* return value from fn() */

if (a && b) {
value = fn(a->x, b->x);
}

Which is to say that the author didn't want the truth value of the function, but rather the integer value. From a stylistic point of view, i would not be tempted to compress this to

int value; if (a && b) value = fn(a->x, b->x);

or even


int value = (a && b) ? fn(a->x, b->x);

This last is particularly odd. It explicitly sets value only if a and b are both non zero. I can't think of any production code that uses the question mark operator without a colon. The question mark operator is not that commonly used as it is, but one could imagine it could be used to conditionally set a variable in this way. Perhaps in some real life example, the variable was initialized. But in this case, it's difficult to imagine how the return that is set to value is used. Since value was uninitialized, how could the code dependably know if it's the return of fn or some random value? Well, perhaps fn has some side effect, such as setting a global variable that the following code could check. It's a poor example, in my opinion.

The idea that NULL is zero is pervasive in C code. However, i've worked on a machine where the NULL pointer was not, in fact, zero. That's because this unusual 16 bit segmented architecture added byte addressability late in its development. So pointers point at 16 bit words. They added an address bit for the even/odd bytes, though for compatibility reasons, it's not the low bit in a pointer. And worse, the bit is set to "1" for the even numbered bytes. So address zero has a "1" set in it somewhere. The C compiler for this machine defined NULL correctly, but it isn't zero. Yes, there are other complications for C on this unique and special architecture. But we did get a Unix kernel to boot on it. So the above code fragment wouldn't work on this machine. You'd need to explicitly compare with NULL, which properly documents intent, and has no performance consequences. It might looks like this:


int value = ((a != NULL) && (b != NULL) ? fn(a->x, b->x);

It should also be noted that this topic is directly related to the second topic in this series. That is Accidental assignment/Accidental booleans. Use of "=" when you meant "==" is a pretty common error. And it's true that it wouldn't happen at all if a strict Boolean type existed in C and people actually used it, and if the C language did not allow integers to be used as Booleans. My own opinion is that in assembler language, which C compiles to, integers are used in exactly this way for Booleans. And, it was up to the programmer to document the meanings of variables. For code size and performance, integers and Booleans are routinely mixed. Therefore it is up to the programmer to document their variables. My own coding style is to do this near the declaration. Most C programmers do not explicitly declare local loop variables with comments where the use is common and obvious. But in my standards, each variable is declared on its own line so that it can be documented. After all, this makes no difference whatsoever to the compiled code.

Friday, March 25, 2011

Filters in Emacs

In the early 80's, i wrote a filter in C called 'onespc' (V7 Unix had 14 char filenames, i tended to omit vowels). 'onespc' by default would read stdin, compress groups of blank lines (with optional white space) to a single blank line, and write to stdout. I'd use it from the command line, sometimes in scripts that changed many files. I'd use it from Emacs on the whole buffer or regions. 'onespc' has a bunch of options to do similar things, like remove all blank lines.

Emacs was sluggish starting up until the 386/33 or 486/25. This may be ancient history, but it was more than a decade for me. I haven't come up with a good way to use Emacs to edit hundreds of files. My current plan is to learn elisp. That should fix everything. Lisp isn't easy to learn. But i've used Lisp and Sheme in the past.

In the late 80's, i wanted something easier in scripts than

for i in *.txt; do
 onespc $i > x
 mv x $i
done


It's not a good solution, since x might exist as a file. So, i wrote a version of 'into'. The syntax is

for i in *.txt; do
 onespc $i | into $i
done

So, 'into' copies stdin to a temp named file, which it determines does not exist in advance. On EOF, it renames it to the argument, deleting the existing file if it can.

I was about to publish all these utilities, but then someone publish the 'getopt' functions for command line. I liked my least ambigous command line parser better. I never liked the '--' standard. I stalled.

Who knows? Maybe Emacs has a simple file mapper. I'd like to be able to do this easily:

for i in `find . $HOME/some/other/place -type f -name \*.txt`; do
 onespc $i | into $i
done

But 'find' and my filter set (including all of the *ix filters, including 'sed' and 'perl') is a pretty powerful set of flexibility to allow.

Despite emacs's internal docs, 'info', 'man', and google, i can't always find docs for what i want to do just at this moment. One learns by putting in effort. One can always learn more.

Monday, March 14, 2011

No Child Left Untested

The news is that President Obama is to push an overhaul of the No Child Left Behind (NCLB) program. As a parent of school age children, i'd like to get rid of it.

From my perspective, what we need is evidence based education. The way that would work is that we would come up with ideas on how to educate better. The first step would be to implement them in small pilot programs. If the new idea works better than the standards, then it would be moved to a larger pilot. Really good programs would be expanded nation wide. Every change implemented at large scale would have costs and benefits understood beforehand. Was there ever a pilot before introduction of No Child Left Behind (NCLB)?

NCLB suggests that we can't test teachers to determine their competence at teaching. I can understand that. Testing is rarely a good test of competence. Managers in industry mostly can't tell competent employees from dead wood. So, the NCLB idea is to test students. But why do we then think that testing students determines their level of compentence? Didn't we just say that testing rarely is a good test of competence?

And, NCLB does not address course approach and content. For example, teaching astronomy with english together allows students to research history, make observations, etc., and write papers about these things - graded for content and form together. It's been shown more efficient. And why wouldn't it be. Students put in a little extra effort to make their papers better, but don't have to do as many. That's more efficient for the students. It's more efficient for the teachers. There's no additional teacher training cost. You use an English teacher and an Astronomy teacher. You just use them at the same time. Teachers alternate classroom time. Both teachers grade papers. This is just one of a zillion examples.

There are lots of cheap programs that have worked well in pilots that have not been fielded at large scales. It's so sad.

Thursday, March 10, 2011

Kepler

If you look at all the colors of the light given off by a star in fine detail, you'll see lines that are characteristic of atoms and molecules that make up the star. These lines can be compared with similar lines when the same atoms or molecules are heated in the lab back here on Earth. So, you can tell what objects are made of anywhere in the Universe. If an object is moving toward you, these lines move to higher frequencies - towards the blue end of the spectrum (at least in visible light). If the object moves away, these lines move to lower frequencies - towards the red end of the spectrum. It's similar to the way a fire truck's siren is higher pitched when it comes towards you, but falls to a lower pitch when it has passed by and moves away from you. Looking at the detail of the colors of light is called spectroscopy.

The first planet discovered around a star other than the Sun was announced in 1995. That's about 16 years ago. Since then, 528 such planets have been discovered (a number that changes almost daily). The method used at the time was the "wobble" method. Most of these discoveries were made using this technique. It's based on spectroscopy. The idea is that as a planet orbits it's star, it tugs on it's star with gravity. So the star wobbles. With spectroscopy, the movement of the star towards or away from us can be detected. It's not just if it's moving away or towards us, but how fast. So, if the planet is closer to us, it tugs the star towards us. If it's on the far side, it tugs it away. The planet has to go around it's star at least once, but more is better.

Now, it's easier to detect movements of a big star if the planet is big. And, it's easier to detect movements of a planet if the planet is closer to it's star. That's because planets closer to the star pull on the star with more force if it's closer. And, shorter, quicker orbits mean that you can get one or more full orbits quicker. So, most of the planets detected this way are large - as big as Jupiter, and close in to their parent star - sometimes closer than Mercury is to the Sun. And, it's easier if the star is smaller than the Sun. A planet can move a small star easier than a big one.

What we'd like find is Earth sized planets in orbit around stars like the Sun. That's because what we'd really like to know is if there are planets like ours. At the moment, the only place we know of for sure with life on it is the Earth. We'd like to know if there are other places with life. Do aliens exist? (If they're on their own worlds, they aren't aliens - they're natives).

There are other ways to discover planets around other stars. One might expect that a picture could be taken of a star at high resolution, and all the planets would show up. Unfortunately, stars shine through their own light, and planets shine mostly through reflected light. So, stars are something like a billion times brighter than planets. It's like looking for a firefly next to a search light, only harder. But it has been done. At least twice. This technique favors planets that are far from their host star, big, and it helps if they're very young, so they can shine in infrared light by virtue of being hot. You have to take at least two images to show that the planet moves with the star. Three images gives you more confidence, and can show the planet arc around in it's orbit.

Another way detect a planet around another star is to watch a star often, and look for a small drop in light as the planet comes in front of the star. You have to look very often to catch it in the act. You have to have pretty good sensitivity, like a part per 50,000. Both of these issues suggest that you need a telescope in space. In space, you can look at the same spot on the sky 24x7. You don't have to worry about poor weather. And, you don't have a boiling Earth atmosphere making changes to your star's brightness every few seconds. But there's another issue. Most planets won't happen to pass in front of their stars from our point of view. It's a geometry thing. If we're looking down on the pole of the star, then we'll never see any planets come in front. And the farther the planet is from it's star, the fewer stars will be aligned close enough to get one cross in front. So, if you want to detect planets this way, you have to look at lots of stars. This method, called the 'transit method', is used by the Kepler space craft. It's looking at the same patch of sky with a keen interest in about 150,000 stars.

And, the Kepler mission has 15 confirmed planet discoveries. In February of 2011, the team announced 1,235 planet candidates. Estimates are than perhaps 80% of these candidates will be confirmed as real planets. That suggests that perhaps 976 new planets will have been found. If confirmed, it more than doubles the current number of known planets. And, this preliminary data is from when the Kepler mission has more or less just gotten started, nothing like sixteen years. Since Kepler hasn't been looking very long, the data favors planets that orbit close to their stars. Bigger planets are easier to spot. But planets as small as the Earth and smaller are among the candidates. And yet, fifty four of the candidates orbit their star at a distance where liquid water might exist on the surface. These kinds of orbits are smaller for smaller stars. Five of these palnets are near the size of the Earth.

The Kepler mission is currently funded for an initial mission of 3 1/2 years. That's because the goal is to find Earth-like planets in orbit around Sun-like stars. We can tell if a star is Sun-like through spectroscopy. We can tell if a planet is Earth-sized by the amount of dimming. The spacecraft needs to detect three transit events to give us confidence that it's really a planet. Three orbits of a planet in an Earth-like orbit around a Sun-like star will take three years. You'll need a little extra to make sure you get three. The mission could easily be extended longer. The spacecraft doesn't run out of fuel at the 3 1/2 year mark. It's is hoped that the data from Kepler will lead to solid statistics on how common habitable planets are in the Universe. Or, at least, in our part of the galaxy.

What can we do with such statistics? We'll, in 1961, Frank Drake proposed a simple formula for estimating how many civilizations there might be in the galaxy. At the time, we didn't know much about what numbers to plug into the formula. The formula has terms like "the fraction of stars with planets". This is a number where Kepler data can help. Better numbers help give us a better estimate.

We also might be able to discover not only if there might be water, but if there actually is water on these planets. The Spitzer infrared space telescope was used to detect a variety of compounds in the atmospheres of a couple extrasolar planets. And while it is no longer capable of this feat, it demonstrates that it can be done. Transit data tells you when to look to pull it off.

The Kepler discovered planets will miss something we would really like. And that is that none of them will be very near to the Earth. If a planet discovered by Kepler has intelligent life on it, communication (by radio) would still take thousands of years, each way. It's tough to hold up much of a conversation with that sort of delay. We'd really like to find habitable planets that happen to be closer to us. But to do that, you have to look in almost every direction at once. And, we'll likely have to use direct imaging. That's going to require very large telescopes in space. In principal, it can be done. In practice, it will be expensive. But the results will definitely be exciting.

Wednesday, March 09, 2011

Truth That Hurts

Edsger W. Dijkstra
You're doing it wrong.
Edsger W. Dijkstra wrote a very short classic paper, How do we tell truths that might hurt? in 1975. In it, he lists a number of things that he imagined computer scientists of the day thought were true "without hesitation". They're great sound bites. That is, they have no context. They have no explanation. But they're delivered by someone (in this case Dijkstra) in authority. For the record, though Dijkstra was amazing, he said a number of things that are no longer true, and/or were overly general.

For example, at the time, languages such as machine language, FORTran or BASIC, one of the primary control structures available was the "GO TO" command. There was no real choice. You had to use it frequently. And it could be very difficult to follow the flow of a program when there were lots of them. You'd try to follow where things were going, but it was like following spaghetti noodles. By the 1980s, block structured languages were abundantly available. That included C, Pascal, and even Fortran, with the 1977 standard. Oddly, one of Dijkstra's favorite languages, Algol 60, was on it's way out. Anyway, in 1968, he wrote an article against the Go To.

In the 80's, i had managers totally go crazy seeing even a single "go to" in my code. Now, while a single "go to" could in principal be difficult to follow, when it is clearly used to exit a doubly nested loop, there's no such issue. The spaghetti argument can't be made. How much of a rat's nest can you make if you are limited to one loop? My argument wasn't that he was wrong. My argument was that by using absolute authoritarian statements, he caused as many problems in the industry as he solved. The C language has the break statement to exit a loop. But one must use "goto" to get out of two nested loops. One of the typically ways to get the same effect is to introduce a state variable. This has two problems. First, state variables can be just as hard to follow, or even worse than spaghetti "go to"'s. Second, checking a run time variable increases code size and slows execution.

The same managers who balked at a single "go to", would scold me for daring to use recursion to solve a problem. Really. Recursion looks strange when you see it for the first time, but it's very powerful. If you're a programmer and don't know it, then it's something you must learn. Get out of your comfortable space and do some growth. Dijkstra would have approved. I haven't yet spotted a reference where Dijkstra was a big Lisp fan, but he was into proving program correctness, and Lisp was and is the language to do that in. You can do lisp without recursion, but Lisp lends itself to recursion so much that many programs use it instead of using loops, and it's quite natural.

Anyway, back to the paper. It's pretty funny. But in his paper, How do we tell truths that might hurt? he never answers the question. It's a rant. If you tell people that they're doing things wrong, you'll ruffle feathers. After all, Galileo was a giant - developing the telescope, discovering all sorts of interesting things with it. But he suffered house arrest by the Vatican for his arrogance. Yet, it doesn't seem to occur to Dijkstra that there's only a little difference between "You're doing it wrong", and "there is a better way to do it". And yet the difference makes all the difference. There is plenty of evidence that the Vatican was already cool with a Sun centered solar system. It wasn't the facts so much as the way they were delivered.

I'm currently learning COBOL - a language i've been avoiding for decades. It's not that it's difficult. It's a little clunky. But it can certainly get the job done. And in the late 70's, everyone claimed that it was on it's way out. But forty years later, it's still going strong. So, what did people know back then? Anyway, the paper starts by calling COBOL a disease to either fight or ignore. Hysterical. But, let's see how the other sound bites hold up.

Programming is one of the most difficult branches of applied mathematics; the poorer mathematicians had better remain pure mathematicians. I agree that programming is difficult. I have an engineering degree. When you design a car, you reuse the same bolt design over and over. In programming, if you're doing the same thing again, you make it a subroutine and call it twice. Ideally, there's no repetition, no repeated parts. And each of those parts works with, reacts, and counter reacts, at least potentially, with every other part in the system. So the complexity goes up faster in programming. Now, only one in four students who start an engineering degree graduate. And, you have to be really sharp to get enrolled. Back in 1975, programmers came from the ranks of mathematicians. Today, it's its own discipline. It's not that mathematicians were poor programmers. Programming requires unique skills. And not everyone gets it. Not all programmers with a Computer Science degree today will meet Dijkra's standards. The standard of competence today is that the programs created work. Dijkstra also required elegance, which leads to mainainability, sometimes performance, in addition to a sort of art appreciation.

The easiest machine applications are the technical/scientific computations. This was likey true at the time. Computers are good at math. If you need to figure out how much the beam will bend under load, it's a pretty easy program to write. But these days, the simulations you need to do for a car crash aren't exactly easy. It's just that computers in 1975 weren't up to the task. I'd say that business applications are the easiest. That's not to say that they're trivial.

The tools we use have a profound (and devious!) influence on our thinking habits, and, therefore, on our thinking abilities. The language you learn, be it English or whatever, has cultural prejudices embedded. And, solving a problem in Lisp will almost always lead to a very different approach than solving it in C. Having done both, my initial guess, that one language would be better than the other at some tasks, has been validated. So with any two languages, each will have it's strengths. I wrote both solutions, using the available language features and styles. So it's not necessarily thinking habits. Habits can be broken. Prejudices can be fought. The process is the same. Think about everything. Don't take anything for granted. Anything less is lazy. So, Dijkstra is right, but not in any absolute sense. In particular, we should not all switch to Lisp, even if that is the language where it's easiest to "prove" the correctness of our programs.

FORTRAN --"the infantile disorder"--, by now nearly 20 years old, is hopelessly inadequate for whatever computer application you have in mind today: it is now too clumsy, too risky, and too expensive to use. FORTran, about 50 years old, has evolved. One of the quotes going around in the 80's was I don't know what language we'll be using in the year 2000, but it will be called FORTran. It's not my favorite language. Last i used it, it was difficult (but not strictly impossible) to write code that manipulated text symbolically. It was great for math computations, but not great at symbolic math. When i got to use C, which is pretty good at math, pretty good with text, and OK with symbols, i pretty much only used FORTran if i had to. But it's still in use today. It's just not as clumsy.

PL/I --"the fatal disease"-- belongs more to the problem set than to the solution set. PL/I was an IBM language. I wrote one of my very first programs in a subset of PL/I, called PL/C. I was maybe 12 or 13. It was block structured. It seemed OK. But i didn't stress the language, so i have no idea what Dijkstra might have been on about. It couldn't have been as bad at the time as BASIC, from Dijkstra's perspective. Last i heard, PL/I was in use at IBM internally only. It may have been abandoned by now.

It is practically impossible to teach good programming to students that have had a prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration. Well, BASIC seemed to encourage GO TOs. All lines were numbered, and these numbers could be used as labels for GO TO. Numbered lines made editing easier. But BASIC had for loops. And these loops added some block structure. Another issue was that all variables had global scope. That limited the size of programs by making the complexity needlessly high. Later versions of BASIC fixed these issues by adding real block structure, local variables for subroutines, and even recursion.

The real problem with BASIC wasn't that it damaged anyone. It was that the tools it provided were good enough. And programmers that learned it, tended to write code in other languages as if those other languages were BASIC. It was comfortable. But Bruce Lee had it right: “There are no limits. There are plateaus, and you must not stay there; you must go beyond them. If it kills you, it kills you.” Bruce studied martial arts. It's hard to imagine that learning Lisp will kill you. And, Lisp might have been what Dijkstra was thinking about as an alternative. It gave you block structure, recursion, complex data structures, and things that are difficult to imagine if all you've seen before is BASIC. The Lisp/Algol course i took in school gave us five weeks for Lisp and two weeks for Algol. Algol is more similar to BASIC. Five weeks was not nearly enough for Lisp. Two weeks was like luxury for Algol.

The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence. I've just started with COBOL in detail. I suppose that Dijkstra would have considered this, like suicide, to have the perpetrator and the victim be the same person. So he'd likely consider it an unpardonable sin. Maybe i'd get special dispensation for learning it as my fiftieth language. But from what i've seen of COBOL, all of the complaints that Dijkstra had for BASIC and FORTran apply to COBOL. There may be more complaints. Perhaps the idea that COBOL is so verbose is one of them.

APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums. I haven't seen APL (A Programing Language) in decades. Perhaps it's dead now. It had some warts. It required a non-ASCII character set that included the entire Greek alphabet. That made it difficult to use in an era when most computer terminals could only display UPPER CASE. APL used single greek letters for built-in function names. And, one would string together dozens of these letters together without spaces to form a new function. Calling it hard to read is an understatement. I thought of it as a write-only language. However, it was terse. Whole programs were often a single line, and a short one at that. And, terse can be good.

The problems of business administration in general and data base management in particular are much too difficult for people that think in IBMerese, compounded with sloppy English. I believe Dijkstra is talking about COBOL again, though maybe with JCL - the other language i'm learning currently. I don't think he was talking about SQL - the Standard Query Language, used by most databases today, invented at IBM. I have issues with SQL, but i'll save that for some other rant. Obviously, business administration has worked using COBOL. I'll grant that other languages would have been OK too, possibly better. But the computer scientists of the 70's turned out to be wrong on many, many fronts, when predicting the future. Their logic was sound, generally. It was the assumptions that were mostly wrong. Can't fault them too much for that. Things happened in computers in the past 30 years that would have been hard to believe. So they probably wouldn't have given much thought to the ramifications. But memory is more than a million times larger. Disk storage is a million times larger. Everything is 10,000 to a million times faster. Everything is cheaper. The rules and goals have changed.

About the use of language: it is impossible to sharpen a pencil with a blunt axe. It is equally vain to try to do it with ten blunt axes instead. I agree with him here. Well, he was vague enough that he could actually get away with being general.

Besides a mathematical inclination, an exceptionally good mastery of one's native tongue is the most vital asset of a competent programmer. I'd agree if i thought that mathematical inclination was very important for computer programming. Native language skill is paramount, to be sure. But i know people who are very good at math that absolutely do not grasp programming concepts in any practical way. I was pretty good at math, but i don't see that it helped me overly much. Skills are skills. Is there one skill that everyone will find more difficult to master than every other? I doubt it.

Many companies that have made themselves dependent on IBM-equipment (and in doing so have sold their soul to the devil) will collapse under the sheer weight of the unmastered complexity of their data processing systems. This simply hasn't turned out to be the case. Perhaps all the big companies have made themselves dependent on IBM or MicroSoft, and so are on equal footing with each other. Digital Equipment, which was a clear alternative to IBM in the 70's, is gone. IBm is still with us. Of course Digital's demise didn't necessarily have anything to do with the quality of their technology, for good or ill.

Simplicity is prerequisite for reliability. What is a prerequisite for reliability is program correctness and maintainability. Generally, these are achieved through simplicity. But i've written programs that i could maintain, but had much difficulty in explaining. They were as simple as i could make them, but that didn't turn out to be very simple. The requirements demanded a certain minimum level of complexity. Yet, these programs had long lifetimes. They were configurable by anyone, and that was pretty much the only maintenance required.

We can found no scientific discipline, nor a hearty profession on the technical mistakes of the Department of Defense and, mainly, one computer manufacturer. Obviously, it was done. I agree that it was a bad idea. And, the industry has managed to move away from that model to some extent. Unix, and the open source movement have had an incredible effect on the industry.

The use of anthropomorphic terminology when dealing with computing systems is a symptom of professional immaturity. I'm not entirely sure where Dijkstra is coming from with this. It's certainly a mistake to talk about computers as "thinking", at least at the moment. Take the example of the chess player. Humans and computers do use some of the same logic. But they don't get at strategy from the same perspective. Humans do better at recognizing situations as similar to historic situations, and work with classes of problems. Computers tend to work at the tactical level so deeply that strategy emerges. So it's different. And that's just one example. It gets worse with more complicated problems. But maybe he meant this: So, there's a bug, and because the computer saw this datum, it came to this erroneous conclusion. It happens. And Dijkstra may call it immature because that's what parents do with their infants, when no such claim could possibly be scientifically made.

By claiming that they can contribute to software engineering, the soft scientists make themselves even more ridiculous. (Not less dangerous, alas!) In spite of its name, software engineering requires (cruelly) hard science for its support. If by soft sciences, Dijkstra includes philosophy or psychology, then i agree. Otherwise, i've no idea what he's talking about.

In the good old days physicists repeated each other's experiments, just to be sure. Today they stick to FORTRAN, so that they can share each other's programs, bugs included. In the 1980's, when C++ was gaining momentum, the claim was that you'd write a good class (set of methods combined with a data representation), and it would simply be reused. You were done with that problem. And the joke was, "Now we can reuse all of our mistakes". My good friend Karl lamented that he was rewritting his subroutine library (the equivelent of a class) for the third time, so solve some issue. I thought version two was pretty damned good. So i told him that "all software
needs to be rewritten". (And in this sense, Dijkstra was right, you are doing it wrong. It can always be better in some sense.) But in the open source arena, you can use the existing code to stand on, and fix it if it's broken. And, you can contribute your fixes, so that when the next version comes out, it has your fixes, but also everyone else's. And there's open competition to have the best code to steal. Physicists still repeat each other's experiments, when at all practical. Otherwise, they examine each other's data.

Projects promoting programming in "natural language" are intrinsically doomed to fail. I think, with the victory of IBM's Watson in the game Jeopardy, natural language is within the grasp of computers in the near future. But Dijkstra was likely talking about COBOL, which was touted as readable by (non-technical) mangers.

PS. If the conjecture "You would rather that I had not disturbed you by sending you this." is correct, you may add it to the list of uncomfortable truths. The current most disturbing thing i've heard is from Carl Sagan's 1980 Cosmos series. He talks about how carbon dioxide has formed a greenhouse on Venus that keeps the surface temperatures hot enough to melt lead. Then he goes on to say that we are engaged in a similar, but uncontrolled experiment with our own atmosphere. Back in 1980, there wasn't much talk about it. We're talking about it now. Sagan didn't say anything in his series that wasn't already solid science. That's why so much of the series is still so relevant, 30 years later. What's most disturbing is that many are still in denial about climate change.

If Dijkstra wanted computer engineering to be practiced by an elite, then perhaps politics, on which our shared planetary environment depends, should be dictated by the best scientists. Just because industries of all sorts have survived with mediocre computer programming, doesn't mean that the Earth will make it. Consider that moving to Mars isn't a solution. Moving to Antarctica is much easier. You don't need to manufacture your own air.

Tuesday, March 08, 2011

Sounds Fast

My newest Sansa mp3 player has slow, normal, and fast modes for podcast play. The fast mode increases the rate at which sound is played back. And the idea is clearly to get through material faster. A side effect is that the pitch of the sound is increased. It's like playing a tape faster than it was recorded. There are techniques for playing back a digital recording faster, without changing the pitch, but this unit isn't doing that. There are both computational and quality advantages to the way the Sansa does it.

I measured the speed at which fast mode plays. It takes 80% of normal time. So, a track which would normally take an hour will take 48 minutes. That's a 12 minute savings. The speed is therefore 1/0.8 = 1.25 times normal. As an aside, in marketing, people do the sale percent math inconsistently with each other, often making it difficult or impossible to predict what the final price will be. This case might be reported as either 20% faster or 25% faster. I'm trying not to be sloppy.

Some of the podcasts either have music introductions, or the show is actually about music. So, for example, i listen to my friend Craig's Open MetalCast. He interviews artists, but also plays tracks from their albums. I was curious how the pitch changes for music and voices. How would one figure this out? It should be a case of a little math.

A piano keyboard is organized into octaves. Each octave going up in pitch has a frequency that is exactly twice as fast as the previous octave. Now, you might think that there are eight notes in an octave - since the oct prefix means eight. The notes are labeled A through G, which is seven notes. The eight comes from counting the note that you start on. So, in this case, A is counted twice. And, indeed, there are eight. But, there are 5 black keys interspersed. There are a total of 12 notes to an octave. You still have to count the starting note twice. It's a fence post problem. The way to solve fence post problems is to think about each one carefully. Otherwise, you'll be off by one.

In the modern equal tempered scale, each half note - adacent keys on a piano, has the exact same frequency ratio as any other adjacent keys. Since there are 12 steps in an octave, and an octave gives you a factor of 2 frequency change, the ratio for a half step is the 12th root of 2, or 2^(1/12).

Now, remember that the speed change is a factor of 1.25. If we want to find out how many half steps that is, we need to know how many times we must multiply the 12th root of 2 by itself to get 1.25. So, here it is in algebra. We just need to solve for x.

(2^(1/12))^x = 1.25

We can take the logarithm of both sides and preserve the equality. It doesn't matter what base log you use. Your calculator may have a base 10 logarithm funcion labeled "log", and a natural logarithm (base 2.718...) button labeled "ln". The Windows calculator, in scientific mode (use the View menu), has these.

log(2^(1/12)) * x = log(1.25)

we can divide both sides by the constant log(2^(1/12)) and get:

x = log(1.25) / log(2^(1/12))

Plugging this into a calculator:

x = 3.8631371386483481744438331538727

This is the first thirty two digits of the answer. I'd be very surprised if the full answer has fewer than an infinite number of decimal digits. Since i only measured the speedup to about a part per 1000 (to the second over about 17 minutes), the result should be rounded to 3 significant digits, or 3.86. So, the speedup is more than a minor 3rd, and closer to a major third. It's not exact. So, all music played this way will not be in an A 440 based key.

As an aside, the quartz crystal used to regulate how the sound is produced is likely accurate to at least nine significant digits. A part per billion. It's quite possible that the manufacturer designed the speedup to be arbitrarily exactly 25% faster. In that case, nine digits, or a value of 3.86313714 might be justifiable. Or, again, who knows, the manufacturer could have wanted A 440 music to stay A 440 (but transposed), and set the speedup to be about 1.25992105 times faster (to get a major 3rd).

Anyway, more or less a major third. What's interesting is how a major third changes everything. Many voices are nearly unrecognizable. The tonal quality of voices generally change dramatically. There are very few spoken voices that you thought were typical deep male radio voices that you'd consider at all deep when played faster. Many adult voices sound like children. And music vocals, singing or rap, change in character similarly. But instrumental music usually sounds pretty normal right away, or pretty normal after a few seconds.

What amazes me about this is that my musical sense of pitch is quite relative, not so much absolute in nature. I can't tell you if a piano is a half step flat, generally speaking, even if i play it. Sometimes i'll accidently play a left hand part in the wrong octave without noticing. Yet, the character of the voice is quite apparent.

Another aside, when i play violin, i carefully tune each string with an electronic tuner. If a string goes out of tune, it generally goes out of tune with respect with the other strings. Once my brain has an absolute reference, i can get the instrument to play accurately pitched notes. There are no frets on a violin. You have to put your fingers in the right spots to get the right pitches.

In the mean time, there is a little music in my podcasts. This music, when played up a major 3rd, often sounds odd for a bit, then i get used to it. There is the odd piece that doesn't translate. For example, Beethoven wrote the Moonlight Sonata in C# minor. You can get a sheet music version transcribed (transposed down a half step) in C minor, which, having fewer sharps and flats, some people find easier to play. If i hear it played in C# minor, then hear it in C minor, it sounds disturbing to me. But if my piano is consistently tuned a half step down, and there's no absolute reference, i'm totally cool with it.

Anyway, this is a little thing. There's a function on my mp3 player. A curious thing, and i was curious. Was there more to think about for this function? Maybe. This musing was well within my comfort zone. I just sort of rambled around and played with what appeared interesting in my most distracted ADHD sort of way. But Bruce Lee had a different idea. He said, There are no limits. There are plateaus, and you must not stay there; you must go beyond them. If it kills you, it kills you. Bruce was into martial arts. A little curiosity won't kill you, unless you're a cat.

Wednesday, March 02, 2011

Pope: Jews not to be blamed for death of Jesus

Pope: Jewish people must never again be blamed for crucifixion

So, Pope Benedict XVI says that Jews are not responsible for the Death of Jesus. Must be a slow news day. The Vatican has maintained this view for decades. But i'd heard the view as a kid and rejected it out of hand.

There's a passage in the Gospel of Matthew, where the Jews shout to Pilate, Let his blood be on us and on our children. For one thing, if this was said at all, these people didn't have the authority to say such a thing. How could they? But since Christianity didn't exist, all Christians at the time were Jews. Further, despite the rampant ramblings of Paul, the Christian Church is founded on Peter, who taught at the synagogue. Christianity is built upon Judaism. Get over it. So, if the Jews are somehow responsible for the death of Christ, so are Christians. And probably Islam. Everyone else is innocent. BTW, if i'm responsible for the death of Jesus, i'm cool with it.

One expects that Biblical scholars have thought about this before. And, they probably have thought about the next puzzle. After all, Biblical scholars should be thinking about the basest blasphemy possible, if only to strengthen the faith. And that is, this. Given that Jesus partakes in the divinity, he could certainly have avoided his own death. He clearly knew about its coming in advance. Scripture says he was tempted to do something about it, and chose not to. So, was Jesus responsible for his own death? Was it assisted suicide?