Wednesday, March 22, 2006

Using Pricewatch Effectively

Summary: I ordered a disk drive from a small distributer, and got what i wanted, when it was expected, at the price offered - roughly half the costs elsewhere, and without rebates or other nonsense. Can't beat that with a stick.

Last week, i ordered a 160 GB hard disk through Pricewatch. I settled on an offering from a company called 3btech. They had the low bid, and are reasonably nearby. Next day delivery from a big name on-line store would have been twice as costly, as was a local store (a big chain). The idea is that instead of buying into a nationwide distribution chain, go with a mom and pop shop within the range that UPS ground can cope. 3btech happens to be about two hours by car from my house. Indeed, the disk drive was ordered on Friday afternoon, and arrived on Tuesday, as expected, and hoped.

The disk drive itself does not come from a big name company. I had never heard of White Label Magnetic Data Tech. The disk drive comes with a six month warranty, so infant death can be avoided. That's as much or more than expected elsewhere.

It turns out that i already own a Western Digital 160 GB hard disk. It's about three years old, and it has been fine. Two ideas are of note. First, disk drives are really reliable for about the age of this WD drive for 24x7 use. Second, only a small fraction of my data is on backed up somewhere. So, either the new drive will be production and the old drive the mostly-offline backup, or i'll implement software mirroring. For now, i'll use the mostly-offline backup idea.

OEM drives come with zero documentation. A sticker on the drive shows the drive selection jumper settings. Once the WD drive was set to master, and this new drive to slave, the BIOS saw them both, and Linux was happy. That's all that matters.

The new drive is identical to my Western Digital 160 GB drive in size. It has the same disk geometry. This will allow easy mirroring.

The new drive appears to be the same speed as my old one. I set them both to use 32 bit DMA transfers in my system startup with a command like:

hdparm -c 1 /dev/hda

On 3btech's web site, this White Label drive says it has a 2 MB on-disk cache, rather than the WD's 8 MB cache. Will this be an issue? No. On-drive cache hit ratios are tiny under Unix systems. The OS checks its own cache first and has much more RAM. I ran studies of this in the 1980's. On-drive caches were new, and software could query the drives for cache hit statistics. On a 9 MB RAM Unix system, a 1 MB RAM on-disk cache got quite poor cache hit ratios. One could think of the on-disk cache as the 2nd MB of cache in a 2 MB cache system. The first MB gives you most of the advantage. My Linux box has 512 MB RAM, and typically allocates over 250 MB to the disk cache. After 250 MB of cache, it matters little if there is an additional 2 MB or 8 MB. The important thing for the drive is that is has RAM for track reads and writes, and perhaps read ahead and write behind (if power reserves allow guaranteed writes through power failure). Consider that after traversing an entire mounted CD, further accesses are instant, and the drive light doesn't even light up. On-disk caches were very important for DOS, and probably important for Windows 3.x, 95, 98, and Millennium. They are probably unimportant for Windows NT, 2000, XP and Vista. For best speed, go with a 15000 RPM SATA or SCSI drive, buy lots of RAM for your system, and consider RAID options. If you need speed, be prepared to pay for it. I just want space and reliability at low cost.

My requirements are for maximum space per dollar, not maximum speed. The tune2fs -l /dev/hda1 command can be used to find out how the old filesystem was built. Then, an examination of the mke2fs man page can yield other ideas. For years, my filesystems have had zero bytes reserved for root. The 5% default space requirement is, in my opinion, a pointless tax. On a 160 GB filesystem, that amounts to 8 GB. The reserve seems to be a holdover from the 4.2 BSD filesystem development studies from the early 80s. Performance tests under Linux have shown that filesystem performance is approximately the same speed when using the last 5% as it is for the rest of the drive. Another holdover is logical disk block size. The Red Hat 9 default block size is 4 KB. However, mke2fs supports the creation of filesystems with 1 KB blocks. As files are generally completely contiguous, and performance generally comes from track reads and writes, it seems reasonable that block size is irrelevant. Smaller blocks waste less space at the ends of files. The more files, the more wasted space. Finally, a quick check of the number of files (inodes) actually used can help you tune the new filesystem. Use df -i. My filesystem had 18,000,000 inodes, but used just over 1,000,000. Inodes take up space, weather used or not. Journaling is a cheap way to gain reliability. So, the command for generating the new filesystem was:

mke2fs -j -b 1024 -m 0 -N 2000000 /dev/hdb1

Once the data was copied from the old drive to the new identically sized partitions, the 6 GB partition had 1 GB more free space, and the 145 GB partition had 3 GB more free space. Compared with default values, these options yield about 7% more space, or 12 GB.

Conclusion

Great deals can be had through services like Pricewatch and low overhead vendors. Just be sure that the overhead you skip isn't overhead you need. Maybe you really need overnight delivery, vendor support, manufacturer support, or documentation. If so, my advice is to buy it. However, if you can manage to educate yourself, you can get what you want cheaper. As Harry Potter fans know, knowledge is power.

Tuesday, March 21, 2006

Caffeine

I have a high tolerance to pain. I don't even notice it. When noticed, i look for some cause and solution. Observation. Almost five years ago, the pain in my right shoulder was bad enough that I could not extend the arm higher than horizontal. I still reach for things on high shelves with my left hand. I didn't complain, but did notice that it was somewhat better on Sundays, and best on Monday mornings. What could that mean? Well, i'd experienced caffeine withdrawal on Sundays. I didn't just get headaches because i was at church. And when my Mt. Dew consumption was reduced, they went away. So, i went cold turkey, and 10 months later, was restored to full movement and the pain was gone. Arthritis in the hands and feet which i'd barely noticed, back pain, shoulder pain, all of it.

And i won't go back. Why would i grab a Dew if it will cause such misery? The reason is that caffeine is rather addictive after all, and by comparison, i have no will power to speak of. But i'm back on the wagon again, and though my hands are better, i know my right foot won't recover until August.

The current additional experiment is to see if non-impact exercise will accelerate the healing process.

Thursday, March 16, 2006

Weapons of Math Instruction

~(AP Newswire)

At New York's Kennedy airport today, an individual later discovered to be a public school teacher was arrested trying to board a flight while in possession of a ruler, a protractor, a set square, a slide rule, and a calculator.

At a morning press conference, the attorney general said he believes the man is a member of the notorious Al-gebra movement. He is being charged by the FBI with carrying weapons of math instruction.

"Al-gebra is a fearsome cult," a Justice Department spokesman said. "They desire average solutions by means and extremes, and sometimes go off on tangents in a search of absolute value. They use secret code names like 'x' and 'y' and refer to themselves as 'unknowns', but we have determined they belong to a common denominator of the axis of medieval with coordinates in every country. As the Greek philanderer Isosceles used to say, 'there are 3 sides to every triangle'."

When asked to comment on the arrest, President Bush said, "If God had wanted us to have better weapons of math instruction, He would have given us more fingers and toes"

Thursday, March 09, 2006

The GNU patch disk

Feh. A database import process at work has been failing on a record that some upstream user has found a way to corrupt. They've managed to pick the delimiter that the file format uses to separate fields in a data record. The delimiter in question is the seldom used ASCII character '~' (tilde). It has been used to indicate the German language double 's' character. At least this is my guess, and even this guess strains the limits of my knowledge of the German language. I confess to being one of those Americans who only really knows one language, and, by the way, thinks that the word American refers only to citizens of the United States. If i meant Western Hemispherian then that's what i'd have said. Thank you very much.

Back to the problem. It is a large multinational corporation that i work for. So finding who can fix this in the upstream feed is going to take some time. In the mean time, i need a fix for my users. Fortunately, there is a very unique string in the broken field, and the data does not change from day to day. A quick substitution filter might provide a temporary fix.

So, i copied the data file to my development box, wrote and tested a quick Unix sed script - and it worked like a champ. I had the foresight to test this script on the production machine, but not installed in the production area. It failed. The new file differed from the original file by more than just the one record. Many, many records were changed. Further, it wasn't immediately clear what exactly was different about the records. A quick check of the file sizes showed that the converted file was much smaller than the original.

Feh. The original file contains null characters (the ASCII code is zero), and the Unix command, sed, was silently stripping them. No one asked it to do this.

So, why did it work on my development system? Simple. The production machine is a big expensive Sun box running Solaris. The development machine is a PC that is so old that it doesn't run modern Windows very well, and so no one wanted it. I installed Linux on it. The Linux version of sed has no bug dealing with nulls. It has no line length limitations. The version of grep on Linux was easily capable of finding the offending record. It would be the one with 43 tilde characters in it, scattered here and there. The Solaris version doesn't even get started.

As an aside, i've been asked, by managers, etc., why i have two PCs. Well, i need Linux to get any actual work done. Multiple times per day i perform tasks under Linux that would take days under Windows, at best. Look at my bill rate and figure out how many such days pays for a PC. I'd like to get rid of Windows, really. But, last i checked, the version of Lotus Notes for Linux doesn't have all the functionality that the Windows version has (is IBM listening?). And, the corporation has a few other tools that are only supported on Windows. In particular, many of the internal web sites are optimized for Internet Explorer, the only single platform browser (now that the Mac version has been orphaned). Why the corporation has an explicit policy of locking itself into using the buggiest and least secure operating system in common use anywhere on Earth is beyond me. Was there a golden handshake somewhere in the past? Is Microsoft marketing that good?

Production is on Solaris 8. Yes, i know, version 10 of Solaris is out, and the corporation has started using it for new projects. This system could be upgraded, but such upgrades are expensive in time and effort, and this production system is being rewritten from scratch piece by piece. In another year, the plug will be pulled on this system. We called it a Sundown.

One of the things that i really like about Solaris is the backward compatibility delivered. Utilities like grep and sed seem to have undergone zero changes since the early eighties. In an industry where six months is also an eternity, this is software from the previous millenia. So scripts that worked back then still work. This is a good thing. But, the bugs haven't been fixed. This behavior with nulls is a bug. I know, Rich Kulawiec said that Any sufficiently advanced bug is indistinguishable from a feature. This feature has many legs. The grep bug where it never finishes compiling the regular expression is also a bug. The easy fix would be to use the versions of these utilities that come with Linux. They are open source, and freely usable. Sun could easily simply use them. Why don't they? For one, they have been rewritten, from scratch, and may not have the exact behavior of the old versions. For example, the new regular expressions have been expanded in power. Can it be proven that all old valid regular expressions will perform as the old versions performed? A dozen years or so of experience with the GNU utilities shows that these fears are largely unfounded. The new utilities have many new features, and people will come to depend on them. This could make moving scripts from newer to older systems more difficult. A minor point, and progress can't be made allowing this point.

When one does not have backward compatibility, for example when using a language that keeps changing, like Visual Basic, or the early days of Java, one eventually finds that one is spending so much time upgrading, that no new code can be written. This is unacceptable. In contrast to MicroSoft, Sun does not require a forced march of frequent upgrades on their customers. If it were my decision, the first time such an upgrade was forced, i'd upgrade off of Windows. That would have been before 1995. Yet companies follow this logic all the time. They'll even lease systems rather than buy them, forcing multiple data and code migrations for production systems that do not require the expanded performance and capacity of the newer systems. What a waste of time and money.

Sun does offer installation packages that have the Linux utilities. They work. My company hasn't installed them. I don't have permission to install them.

The last version of Solaris that i used that i really liked was version 7 Solaris running on a Pentium II (Solaris x86). It wasn't better than version 8. It is just that i had administrative rights to the box (root), and over the course of a year, i installed almost all the open source utilities that i had come to expect from Unix (from Linux use). Sun provided handy packages for painless installation (and still does - thanks, Sun). The Solaris x86 system had the feature that when i compiled something and got it to run, it would compile and run on a Sparc based Solaris system without modification, every time. However, the time it took to get all this stuff installed was aggravating, and i often wished i'd just installed Linux from the start.

Without upgrades of some kind, any product stagnates. Frequent upgrades are expensive in support. So, infrequent upgrades is the current goal. Backward compatibility is important. However, the bugs really need to be addressed. And, no, it isn't good enough to load Solaris, then apply the GNU patches GNU versions of these utilities, perhaps having path order dependent behavior. Solaris needs to make a clean break from the hoards of bugs and limitations of the old utilities, in the name of reduced customer headaches. These bugs include arbitrary line length limits, troubles handling characters with the high bit set, troubles handling null characters, problems coping with complicated, or just long, regular expressions, and on, and on, and on.

The solution, by the way, was to use Perl. But Perl is one of those new fangled open source products. Though it is non-standard for this company, it is required for the application, and so was available. It may not be there next time.

Wednesday, March 08, 2006

COSMOS

I'm currently watching the 2000 DVD version of Sagan's (1980) COSMOS. As an educated layman defending science and debunking stupidity, i have to say that Sagan was really good at this stuff, and having a pro do it is a good thing. Further, it is in the scientist's own interest to make sure science is understood by the masses. Much of the funding for science comes from the public.

Of late, i've come to be of the opinion that i've been quite ineffectual in debunking the stupid. Coming down like a ton of bricks has the same effect that Bible thumpers had on me while i was growing up - which is none. I have to admit that all i've done is look like i'm trying to sound smart, when the smartest i've ever been was when i was asking stupid questions. What does seem to work is teaching using the audience's vocabulary, and using compassion. What you really need to avoid doing is saying, "i'm really smart, and these are the fact." You need to say, "This is the process that scientists find works the best." So, when an ID'er says that Carbon 14 dating isn't any good for talking meaningfully about T-Rex, i agree, and state that there are several lines of evidence that, though they include radio dating techniques that aren't Carbon dating, they also include stratification and other techniques, which, when combined, yield answers that are consistent with each other, and lend confidence to the results. Then i use an example i really know about: that the 1998 result in which the team states that the Universe we live in is accelerating apart at an increasing rate was delayed while they rechecked their process in several ways to avoid saying something that was wrong. It was an extraordinary claim, and such things demand extraordinary evidence. And it was extraordinary. It wasn't one of the three Freedman models that the Universe must be. And, yes, this result could be wrong. And yes, there is continuing study.

Easy ways for scientists to promote their causes are to write blogs, and when they complete some task, do a phone interview with a podcaster. Many of the science podcasts are made by educated laymen who have the time to figure out how to assemble podcasts and distribute them. They need content. The scientist has content that needs to get out.

Sunday, March 05, 2006

Growing up

What do you want to be when you grow up?

I want to become so old that i can plan my own surprise birthday party.