Monday, March 28, 2011

Top Ten, 19 of 10

We last left our hero some time ago in the top ten list of ways to be Screwed by "C". I happened to be looking at the old posts, and noticed that the author has added a couple. Today's entry is Accidental Integers. The first example is:


int a = 2 && 4 && 8; // what is the value of "a" ?

The issue here is that integers are also Booleans in the C language. So this expression is expected to use the constant "2" as a Boolean, which is clearly not false and therefore true, perform logical and with "4", which is also true, and perform a logical and with "8", which is also true. The answer is true, and the variable a is set to 1. On read, integer values that are zero are false, and anything else is true. But if the language has to pick a non zero value, it uses 1. Since this example has several constants in an expression, the compiler would evaluate these at compile time and simply assign the value of 1 to a. The second example is a little curious. It's supposed to be the same.

int value = a && b && fn(a->x,b->x);

In this example a and b are structure pointers, and fn is a function which returns an integer. The author wants to check to see that the structure pointers aren't NULL before calling the function, which in C is usually zero. NULL is typically a zero cast to (void *). If you check the boolean truth value as in this code example, you are expecting the value to be compared with zero. But also, the && and operator is a shortcut operator. If the left hand side of the operator is false, then the right hand operator is not evaluated. So, the idea in this code fragment is if pointer a isn't NULL then check if pointer b isn't NULL, and if it isn't, call function fx and set value to the truth value of the integer returned by fn. One suspects that the unexpected result is how the return value is assigned. So this is probably what the author wanted:

int value; /* return value from fn() */

if (a && b) {
value = fn(a->x, b->x);
}

Which is to say that the author didn't want the truth value of the function, but rather the integer value. From a stylistic point of view, i would not be tempted to compress this to

int value; if (a && b) value = fn(a->x, b->x);

or even


int value = (a && b) ? fn(a->x, b->x);

This last is particularly odd. It explicitly sets value only if a and b are both non zero. I can't think of any production code that uses the question mark operator without a colon. The question mark operator is not that commonly used as it is, but one could imagine it could be used to conditionally set a variable in this way. Perhaps in some real life example, the variable was initialized. But in this case, it's difficult to imagine how the return that is set to value is used. Since value was uninitialized, how could the code dependably know if it's the return of fn or some random value? Well, perhaps fn has some side effect, such as setting a global variable that the following code could check. It's a poor example, in my opinion.

The idea that NULL is zero is pervasive in C code. However, i've worked on a machine where the NULL pointer was not, in fact, zero. That's because this unusual 16 bit segmented architecture added byte addressability late in its development. So pointers point at 16 bit words. They added an address bit for the even/odd bytes, though for compatibility reasons, it's not the low bit in a pointer. And worse, the bit is set to "1" for the even numbered bytes. So address zero has a "1" set in it somewhere. The C compiler for this machine defined NULL correctly, but it isn't zero. Yes, there are other complications for C on this unique and special architecture. But we did get a Unix kernel to boot on it. So the above code fragment wouldn't work on this machine. You'd need to explicitly compare with NULL, which properly documents intent, and has no performance consequences. It might looks like this:


int value = ((a != NULL) && (b != NULL) ? fn(a->x, b->x);

It should also be noted that this topic is directly related to the second topic in this series. That is Accidental assignment/Accidental booleans. Use of "=" when you meant "==" is a pretty common error. And it's true that it wouldn't happen at all if a strict Boolean type existed in C and people actually used it, and if the C language did not allow integers to be used as Booleans. My own opinion is that in assembler language, which C compiles to, integers are used in exactly this way for Booleans. And, it was up to the programmer to document the meanings of variables. For code size and performance, integers and Booleans are routinely mixed. Therefore it is up to the programmer to document their variables. My own coding style is to do this near the declaration. Most C programmers do not explicitly declare local loop variables with comments where the use is common and obvious. But in my standards, each variable is declared on its own line so that it can be documented. After all, this makes no difference whatsoever to the compiled code.

No comments: