Thursday, March 09, 2006

The GNU patch disk

Feh. A database import process at work has been failing on a record that some upstream user has found a way to corrupt. They've managed to pick the delimiter that the file format uses to separate fields in a data record. The delimiter in question is the seldom used ASCII character '~' (tilde). It has been used to indicate the German language double 's' character. At least this is my guess, and even this guess strains the limits of my knowledge of the German language. I confess to being one of those Americans who only really knows one language, and, by the way, thinks that the word American refers only to citizens of the United States. If i meant Western Hemispherian then that's what i'd have said. Thank you very much.

Back to the problem. It is a large multinational corporation that i work for. So finding who can fix this in the upstream feed is going to take some time. In the mean time, i need a fix for my users. Fortunately, there is a very unique string in the broken field, and the data does not change from day to day. A quick substitution filter might provide a temporary fix.

So, i copied the data file to my development box, wrote and tested a quick Unix sed script - and it worked like a champ. I had the foresight to test this script on the production machine, but not installed in the production area. It failed. The new file differed from the original file by more than just the one record. Many, many records were changed. Further, it wasn't immediately clear what exactly was different about the records. A quick check of the file sizes showed that the converted file was much smaller than the original.

Feh. The original file contains null characters (the ASCII code is zero), and the Unix command, sed, was silently stripping them. No one asked it to do this.

So, why did it work on my development system? Simple. The production machine is a big expensive Sun box running Solaris. The development machine is a PC that is so old that it doesn't run modern Windows very well, and so no one wanted it. I installed Linux on it. The Linux version of sed has no bug dealing with nulls. It has no line length limitations. The version of grep on Linux was easily capable of finding the offending record. It would be the one with 43 tilde characters in it, scattered here and there. The Solaris version doesn't even get started.

As an aside, i've been asked, by managers, etc., why i have two PCs. Well, i need Linux to get any actual work done. Multiple times per day i perform tasks under Linux that would take days under Windows, at best. Look at my bill rate and figure out how many such days pays for a PC. I'd like to get rid of Windows, really. But, last i checked, the version of Lotus Notes for Linux doesn't have all the functionality that the Windows version has (is IBM listening?). And, the corporation has a few other tools that are only supported on Windows. In particular, many of the internal web sites are optimized for Internet Explorer, the only single platform browser (now that the Mac version has been orphaned). Why the corporation has an explicit policy of locking itself into using the buggiest and least secure operating system in common use anywhere on Earth is beyond me. Was there a golden handshake somewhere in the past? Is Microsoft marketing that good?

Production is on Solaris 8. Yes, i know, version 10 of Solaris is out, and the corporation has started using it for new projects. This system could be upgraded, but such upgrades are expensive in time and effort, and this production system is being rewritten from scratch piece by piece. In another year, the plug will be pulled on this system. We called it a Sundown.

One of the things that i really like about Solaris is the backward compatibility delivered. Utilities like grep and sed seem to have undergone zero changes since the early eighties. In an industry where six months is also an eternity, this is software from the previous millenia. So scripts that worked back then still work. This is a good thing. But, the bugs haven't been fixed. This behavior with nulls is a bug. I know, Rich Kulawiec said that Any sufficiently advanced bug is indistinguishable from a feature. This feature has many legs. The grep bug where it never finishes compiling the regular expression is also a bug. The easy fix would be to use the versions of these utilities that come with Linux. They are open source, and freely usable. Sun could easily simply use them. Why don't they? For one, they have been rewritten, from scratch, and may not have the exact behavior of the old versions. For example, the new regular expressions have been expanded in power. Can it be proven that all old valid regular expressions will perform as the old versions performed? A dozen years or so of experience with the GNU utilities shows that these fears are largely unfounded. The new utilities have many new features, and people will come to depend on them. This could make moving scripts from newer to older systems more difficult. A minor point, and progress can't be made allowing this point.

When one does not have backward compatibility, for example when using a language that keeps changing, like Visual Basic, or the early days of Java, one eventually finds that one is spending so much time upgrading, that no new code can be written. This is unacceptable. In contrast to MicroSoft, Sun does not require a forced march of frequent upgrades on their customers. If it were my decision, the first time such an upgrade was forced, i'd upgrade off of Windows. That would have been before 1995. Yet companies follow this logic all the time. They'll even lease systems rather than buy them, forcing multiple data and code migrations for production systems that do not require the expanded performance and capacity of the newer systems. What a waste of time and money.

Sun does offer installation packages that have the Linux utilities. They work. My company hasn't installed them. I don't have permission to install them.

The last version of Solaris that i used that i really liked was version 7 Solaris running on a Pentium II (Solaris x86). It wasn't better than version 8. It is just that i had administrative rights to the box (root), and over the course of a year, i installed almost all the open source utilities that i had come to expect from Unix (from Linux use). Sun provided handy packages for painless installation (and still does - thanks, Sun). The Solaris x86 system had the feature that when i compiled something and got it to run, it would compile and run on a Sparc based Solaris system without modification, every time. However, the time it took to get all this stuff installed was aggravating, and i often wished i'd just installed Linux from the start.

Without upgrades of some kind, any product stagnates. Frequent upgrades are expensive in support. So, infrequent upgrades is the current goal. Backward compatibility is important. However, the bugs really need to be addressed. And, no, it isn't good enough to load Solaris, then apply the GNU patches GNU versions of these utilities, perhaps having path order dependent behavior. Solaris needs to make a clean break from the hoards of bugs and limitations of the old utilities, in the name of reduced customer headaches. These bugs include arbitrary line length limits, troubles handling characters with the high bit set, troubles handling null characters, problems coping with complicated, or just long, regular expressions, and on, and on, and on.

The solution, by the way, was to use Perl. But Perl is one of those new fangled open source products. Though it is non-standard for this company, it is required for the application, and so was available. It may not be there next time.

No comments: