Wednesday, January 02, 2008

Top Ten, 9 of 10

When we last left our hero, we were looking at ten ways to get screwed by the "C" programming language. Today's entry is Permissive compilation.

Three examples are presented. The first has to do with the comma operator. The gcc's -Wall option at least warns about it. The second has to do with initialization of variables. The third is an incompletely specified array. I have to say that i agree that all these problems are serious language or implementation issues.


I once modified some code that called a function via a macro:

CALLIT(functionName,(arg1,arg2,arg3));

CALLIT did more than just call the function. I didn't want to do the extra stuff so I removed the macro invocation, yielding:

functionName,(arg1,arg2,arg3);

Oops. This does not call the function. It's a comma expression that:
  • Evaluates and then discards the address of functionName
  • Evaluates the parenthesized comma expression (arg1,arg2,arg3)


There are two issues here. First, C compilers are not required to complain that no code is produced. The gcc compiler's -Wall option says warning: left-hand operand of comma expression has no effect. If there is no comma operator, then warning: statement with no effect is emitted. The second problem is the comma operator. The lesson is that you should be using -Wall.

We've already seen that commas inside a function call are not order-keeping comma operators. It turns out that people really do use semicolon statement terminators, but seldom use comma operators. They are handy in the creation of macros. The comma operator can keep multiple statements executed together, without introducing additional block structure. This turns out to be important, because there are places where a statement can go that a new block can't. With the advent of compilers that can inline functions, there is at least less need for this kind of macro.

main() {
int var = 2;
switch (a) {
int var = 1; /* This initialization typically does not happen. */
default:
printf("var=%d\n", var);
}
}

This is stunning. The -Wall option complains that the first var is unused. This is true enough. It complains that the second var is used uninitialized. And sure enough, it prints stack garbage. I recall vaguely that the original C standard did not allow initialization of variables after the start of a function. That is, you could declare them, but not also give them a value. What could the compiler do? The inner var could be allocated at the same time that the outer variable is declared, at the start of the function. It could be set just before the switch is executed. Given that it allows the syntax without comment, that is what it should do. However, if it isn't going to do anything, it shouldn't accept the syntax. It's a bug, perhaps in modern gcc (4.2.1). No possible good can come from this.

Why is this a problem? C has always worked this way, right? However these days, there many C like languages - C++, Java, and so on, and many do allow this syntax. Many of us program in multiple languages, so it's much more likely that this is an issue. It is NOT a bug in these other languages.

#define DEVICE_COUNT 4
uint8 *szDevNames[DEVICE_COUNT] = {
"SelectSet 5000",
"SelectSet 7000"}; /* table has two entries of junk */

While it is unfortunate that two entries in the allocated table are junk, they are no more junk than if no entries were initialized. I've personally used this language feature to initialize part of an array, that is later filled in completely. Now, when i've done it, the array is allocated in the heap as static or global memory. The unused entries don't have junk, but rather have zeroes. And, of course, my program had logic that prevented misuse. So, while C let's you shoot yourself in the foot, it also lets you do things that are difficult in other languages for some benefit - perhaps some performance reason.

No comments: