Friday, January 11, 2008

Top Ten, 17 of 10

When we last left our hero, we were looking at ten ways to get screwed by the "C" programming language. Today's entry is Signed Characters/Unsigned bytes.

There is no byte datatype in C.

Yes, integers are signed unless you say unsigned. Yes, integer arithmetic that overflows or underflows is silent. That's largely because that's how integer arithmetic works on most machines. On an x86, char is signed. However, at least on an Arm processor, char appears to be unsigned. So it's really worse than inconsistent on one machine. Behavior can change when you port to another machine. In this case, the x86 and the Arm processors are 32 bit machines. They have the same byte order within words, which is to say that they are both little endian. The same compiler, gcc, was used in both cases.

Consider a program that reads a file a character at a time:

#include <stdio.h>

int main(int argc, char *argv[]) {
char c;
while ((c = getchar()) != EOF) {
putchar(c);
}
return 0;
}

On an Arm, gcc reports that the variable c will never equal EOF. And, indeed, if you attempt to run the program, it reads the file, then writes a bogus character (0177) forever. It does not see the End of File signal. On an x86, this program behaves like cat(2), unless the file has a 0177 character somewhere in the stream. If it does, it stops processing at that point, even if there is more data to read. If the declaration of c is changed from char to int, the problem vanishes. Now, EOF is documented to be int. It's supposed to be a value that is never a character that could be in a file. However, conceptually, the program is reading a file one character at a time. So, it's easy to forget.

By contrast, floating point overflows can generate exceptions. That's because that's how floating point hardware works.

In the Lisp language, integers never overflow. That's because if you add one to a small integer, it simply becomes a larger integer. When it becomes too big for a 32 bit integer, it magically becomes something larger than 32 bits. You always have the right answer. And yet, it turns out that this feature isn't something that i use much in Lisp. I'd be happier if integers were all small and fast. Especially fast. The good news is that you can now get the performance of a an original Pentium using the Lisp language, if you have a modern computer to work with.

No comments: