Thursday, January 03, 2008

Top Ten, 10 of 10

When we last left our hero, we were looking at ten ways to get screwed by the "C" programming language. Today's entry is Unsafe returned values.

char *f() {
char result[80];
sprintf(result,"anything will do");
return(result); /* Oops! result is allocated on the stack. */
}

The issue is that the variable results is an automatic variable. It is allocated on the stack. It is not initialized, and has whatever garbage happened to be there. That's fine. The function writes to it before it is returned. However, when the function returns, the memory on the stack is free to be used by other routines. It may get overwritten before it can be used. For example, if the the caller then calls printf() to print the result, that may use enough of the stack memory to corrupt the value, with unpredictable results. If you're lucky, you get a segmentation violation. If not, you get the wrong results from a program that appears to work.

These days, gcc reports warning: function returns address of local variable even without -Wall for this code. It has been a problem in the past. Indeed, a code review of production code showed a library had this issue with essentially every function in the library. The solution was to make the buffer static in every case.

char *f() {
static char result[80];
...

In this case, the result buffer is allocated on the heap, and will remain untouched until that routine is called again. That routine may overwrite the value. But since the static keyword has scope implications as well as storage class implications, one knows that only that routine can do anything to it. That is, the caller knows that if it avoids calling that routine again, the value won't change. There are call graph utilities that can help the programmer be sure. Of course, the caller has a pointer to the buffer and can do whatever they want with it. But surely, if the programmer does that, they'll know they're changing it.

In Andrew Koenig's book, C Traps and Pitfalls, there is a subtle variant of this issue. On the PDP-11 (a sixteen bit machine), memory was often tight. To get a few extra bytes of heap, programmers would often declare a buffer on the stack, and call setbuf(3) to tell the standard library to use it instead of one on the heap for one of the standard streams, like stdout. Since main() has scope and life for the life of the program, it sounds as if this should always work. It never failed. However, the standard library also has onexit(3), where you can register a function to be called whenever exit(3) is called. The trouble is, these functions are called with the stack unwound to the caller of main - traditionally called crt0. This was a small chunk of machine code that set up the stack and called main(). The upshot is that functions might be called after main() has exited and it's stack variables are free, and possibly corrupt. If you have enjoyed the top ten, you'll likely enjoy Andrew's book. This second reference, has a table of contents.

No comments: