Monday, February 04, 2008

Scheme For Enlightenment Part Ten - A Benchmark

This benchmark has been kicking around since around 1982. It was originally written in C for microprocessors that had, by today's standards, very little memory. On modern processors (as new as a 486) the entire program and data fits into the on-chip processor cache. It's too small to induce TLB thrashing. In short, this tiny benchmark screams on modern CPUs. While there is plenty that can be learned from it, it is not representative of the big long running applications that can chew up a modern machine.

#!/usr/bin/guile \
-e main -s
!#
;;; Eratosthenes prime number sieve, Answer is 1899
;;; $ sieve.g loops
;;; Based on benchmark from Byte, Sept 82.
(define (sieve x)
(define flags (make-vector 8191 #t)) ; one-bit flags
(define count 0)
(define prime 0)
(define k 0)
(display "Iterations: ")
(display x)
(newline)
(do ((iter 0 (+ iter 1)))
((>= iter x))
(set! count 0)
(set! flags (make-vector 8191 #t))
(do ((i 0 (+ i 1)))
((>= i 8190))
(if (vector-ref flags i)
(begin
(set! prime (+ i i 3))
(set! k (+ i prime))
(while (<= k 8190)
(vector-set! flags k #f)
(set! k (+ k prime))
)
(set! count (+ count 1))))
))
(display count)
(display " primes")
(newline)
)

(define (main args)
(sieve (string->number (cadr args))))

You tell the program how many times to loop from the command line. The original so-called 8 bit processors, the 8080, the 6800, and so on, which ran at 1 to 4 MHz, took quite awhile to perform one loop, when coded in C. One loop is also sufficient for the 25 MHz Dragonball 68000 CPU in my Handspring Visor Platinum Palm Pilot in Lispme. For modern desktop processors using C, 10,000 loops are used.

The prime number sieve of Eratosthenes discovers the prime numbers by a simple process. First, all the flags are set true. Then all the even numbered flags after the 2nd are set false. The array is checked for the next true flag, which is the third flag. So every third flag is set false after that first one. And so on. However, this version does not represent the even numbered flags. So the formula for what flag to set next is slightly more complicated. It was great for the primitive microprocessors of the 1970's. These processors had no divide instructions, which would usually be needed to test for prime numbers.

The flags variable is an array. It could be of bits. Each element holds a true or false value. Guile supports something called uniform vectors. And vectors of single bits are supported. The Guile Reference manual suggests that uniform vectors should be faster than regular vectors, where every element could be a different type. However, one bit, 8 bit signed and unsigned bytes, and 32 bit signed and unsigned words are all slightly slower than plain vectors. Perhaps some future version will optimize uniform vectors. One bit flags does take less space. Since there are 8,191 flags (2^13 -1), one bit flags should only take 1 KB, instead of 8 KB. There is a shifting and masking penalty, however.

When all is said and done, this sieve doesn't report the actual prime numbers found. It just counts how many numbers are prime, and reports that.

There is kind of an oddity with the vector. First, (define) is used to create the vector, with all flags true. Then, in the loop, (set!) is used to set all the flags to true. There's no way to create a vector without also setting the values. The (make-vector) function requires it.

The do loop is used instead of let loops, recursion, tail recursion, etc., because at least in Guile, the do loop is quickest. However, as much as possible, the original C version has been translated directly. The idea isn't to generate prime numbers. The idea is to create a Scheme version of the benchmark that does the same amount of work done by other versions. Versions of this benchmark exist in C, scheme, BASIC, Perl, Python and other languages. And the idea is to get some idea of how fast each language performs on some given machine.

For one machine, C was 340 times faster than Guile. How does this stack up to other languages? It's faster than Perl or BASIC, but slower than Java. One might think that a factor of 340 is unusably slow. However, some applications don't require much total time. For example, dynamic web pages tend to require a fraction of a second in Perl. A C version isn't much different. If there were some reason to use Scheme for such a task, for example because large integers were needed, then why not? But much more likely for Scheme, some algorithm might lend itself to a functional programming style.

I hesitate publishing this benchmark. I'm a relative novice to Scheme programming. I find it very difficult to judge how long things will take. Scheme, and as i understand it, Lisp in general, is very sensitive to the details of how data is represented. It's possible that a version might be written which is much faster. If you see something that could be changed, let me know.

No comments: