Thursday, February 21, 2008

Scheme For Enlightenment Part Twenty One - Hackable Closures


In the case of Scheme, particular features that make programming easier - and more fun! - are its powerful mechanisms for abstracting parts of programs (closures - see About Closure) and for iteration (see while do).

The Guile Reference Manual defines closure as the ability to remember the local environment when creating a function. The manual goes on to list four uses of closure.

The first corresponds to C's static local variables.

The second, called Shared Persistent Variable corresponds roughly to C's file static variables. This is a variable that is not accessible to the caller, except in the way that the provided functions grant. While i grant that it is handy for a library to have state variables that the caller won't accidently change unexpectedly, many languages have extended this to absurdity. For example, in Java, one is supposed to never touch a class variable directly. One should use accessor functions to set and get values. This is true even if the function is supposed to get data from a database. That means that the class must have a zillion accessor functions to set and get values. The class is huge, the caller's syntax is ponderous. The compilers either must optimize the calls away, or leave the extra code in. It buys neither the class nor the caller anything of value. In C, Shared Persistent Variables are supported, and they tend to get used as needed, without being elevated to the level of the Holy Grail.

Third, The Callback Closure Problem. The argument is that with closures in Scheme, a call back function can be registered that has all the environment stuff it needs. The callee doesn't need to know all the arguments and such, it just has a function to call, and calls it.

C supports callback functions. They have more syntax associated with them, but that's mostly because C is a statically typed language. In my opinion, the library is where the complicated stuff should be. The caller shouldn't have to build a call back function factory to simplify the library. The library should do as much heavy lifting as possible. And, i've never found this bit of library creation to be error prone, frequent or difficult.

Function factories need to be addressed in more detail. This is as good a place as any. Let's say that one needs a serial number generator. Each number is unique. A function with internal persistent state can simply have a counter that can be incremented, and the new number can be returned. One nice thing in Scheme is that the number can be said to never overflow. Scheme makes the number bigger if needed. But let's say that you need two serial number sequences, one for each of two products. In Scheme, one makes a serial number function factory. Each returned object can be used to generate a sequence that is independent from every other sequence. Can this be done in C? You bet. The difference is that in C, what you do is return a token that the sequence generator uses to determine which sequence is desired. So, the caller must pass a parameter to the generator, rather than have a function that can be called with no parameter. The caller still must remember the function (pointer), which is it's own parameter. I just don't see this as a big deal.

Fourth, Object Orientation. Object and method encapsulation can be a good thing. C libraries can do this. But there isn't any syntax specifically there to support it. Now, having a language that provides object oriented syntax can be a good thing. Objective C is an example of a language that does this without going over the top, as IMO, was done in C++ and Java. And mixing C style code with object oriented code is very natural. Many libraries make sense if written in an object oriented manner. Objective C allows calls to such libraries at will. Even inheritance can be supported in C. It's just that supporting virtual functions takes more work than most C programmers are willing to do, so very few C applications do it. I've only written code that way perhaps twice in a quarter century of C. So only those functions that really needed it also paid the performance penalty required.

In C, Object Orientation can be achieved by convention, rather than be strictly enforced by syntax. So, the maintainer of a chunk of such C code does not have syntax to quickly prove that, for example, only these functions make any changes to this variable. Such things are good. Because if one can limit the scope of where to look for behavior, there is less to look at. Remember that humans are slow. But C has scope rules. And even if some convention is violated in the code, one should be able to spot C's scope rules. For example, i use the keyword static for file global variables often. That means that the text editor can be used to quickly search for all uses of that variable. In C, one knows that this variable is only accessible from that point to the end of the file. Less, if there are blocks where variables are shadowed by local variables. I essentially never intentionally shadow global variables with locals, as it can cause confusion later.

So, the argument is that you can try little bits of Scheme easily. I recall doing that in C. What i was trying to do is figure out the exact semantics of the C language. The White book, The C Language, talks about what the language does, but isn't very precise. So i'd write something quick, like

for (i = 0; i < 5; i++) {
call_something(i);
}

compile it, and step through it with a debugger. I'd also ask the compiler for the assembly language to see what it really did. I had the good fortune to have written ten or twenty thousand lines of PDP-11 assembly language, and had access to Unix running on a PDP-11 with the language writer's own compiler - the Ritchie C compiler. At the time, this was the gold standard for how C should behave. This gave me insights not only into what the exact semantics were, but also why Dennis Ritchie designed them that way. No tricks like that are available for Scheme. The best you might do is read the Scheme standard - which at least is short. A C standard now exists. However, for real understanding, the translation to assembly can hardly be matched.

No comments: