Tuesday, January 08, 2008

Top Ten, 14 of 10

When we last left our hero, we were looking at ten ways to get screwed by the "C" programming language. Today's entry is Under constrained fundamental types.

I learned C working on 16 bit PDP-11's in a world of mostly 32 bit Vaxen. These days, i work mostly with 32 bit machines even though 64 bit machines are commodity items. My code tends to be portable and efficient where porting to a new architecture is largely simple recompilation. In my opinion, there are no 8 bit machines. Pointers on 8080's, z80's, 6800's, 6809's, 6502's and 1800's were all 16 bits. These are 16 bit machines.

In C, an int must be at least 16 bits, but compilers can make them larger. It's reasonably easy to write code on a 16 bit machine and port it to a larger 32 bit architecture. It takes more effort to do the reverse and get it right. I've personally seen compilers that provide 16, 32, 36 and 64 bit integers when asked for an int. This was clearly a language design feature. It allows the same code to work reasonably efficiently on machines of different sizes.

I've been asked if, every time i use an int, do i really think of it as being only "at least 16 bits"? The answer is, yes, i really think about it, but no, sometimes my expectation is that this int will be able to index my largest arrays on the local machine. Since strings are arrays of single bytes, it is expected that sizeof(int) == sizeof(char *). The language does not guarantee this, however. It should. When the Digital Equipment Corp Alpha processor team announced that their C compiler had a 16 bit short, a 32 bit int, and a 64 bit long, i was disappointed. A 32 bit int only gives you four giga entries into an array, yet the machine is capable of having arrays quite a bit larger. I had also expected that int was a redundant type that could be either short or long. I doubt that i depend on this. The discipline of thinking about something every time is a real asset to programmers. It is the only known solution to the fence post problem, for example.

Now, gcc's long long int concept is great for 32 bit machines providing 64 bit integers. Here, int is 32 bits which matches pointer sizes, and is efficient on such a machine. The long long int type is strictly for larger integers. And, i've used them for computing hashes. Very efficient. It should be noted, however, that the keyword that originally meant an integer twice as big as a pointer was orginally long. The long type on the PDP-11 was 32 bits, where an int and char * was 16 bits.

Without a doubt, C needs a new integer size keyword that allows for 64 bit integers. So if i were God and could go back in time and fix this in C's design, it might look like this:








char *charshortwordswordpenlongint
1681632641283216
3281632641286432
64816326412812864

In the mean time, perhaps one can #include <sys/types.h> and get typedefs for constrained integer sizes up to 64 bits. But long is largely broken.

The migration from 16 to 32 bit machines, and now from 32 to 64 bit machines represents growing pains for C. It's a hard problem, and remains painful.

No comments: