Tuesday, February 05, 2008

Scheme For Enlightenment Part Eleven - Four Colors

One of the destinations that functional programming seems to lead to is decision tree searching. Scheme allows such trees to be searched in a depth first manner fairly easily.

One technique in scheme is based on the call-with-current-continuation function, often abbreviated call/cc. This function allows one to get back to some state and continue computation there. It can act like C's longjmp, which can unwind the stack to some prior point. But additionally, call/cc can rebuild the stack to some deeper depth(!). Guile keeps copies of the whole stack. Other scheme implementations may have some more efficient (quicker) way to implement this magic. And, it's magic, because it allows one to remember not just the call to a function, but all of the context leading up to that call. And then, one can return from that function multiple times. Or one could say that the function can be called multiple times with remembered call state. Between returns, some state may have changed in the heap, so it won't just do the same thing it did last time. It looks like magic, because it is.

They say that the function amb is as old as Lisp. Yet something like no implementations of Scheme come with an implementation. Guile is not an exception here, so an implementation is included. The idea is that amb takes a function and several arguments, and returns an argument where, when passed to the function, returns true. One might expect that it just walks down the list and tries one at a time. However, amb is nestable. That is, one can have several calls pending, and all calls will return arguments so that their functions are true. I don't pretend to know the details of how it works, except that the trial and error uses call/cc to start again. Also, it's a macro, which means that where it is used is text-substituted. It's not a function of it's own. The syntax is odd, and will not be explained here.

Finally, assert uses amb. This function allows one to harbor the illusion that a mathematical assertion is being made. So, first, amb and assert. Put this into the file amb.scm so that the other program will know where to get it.

(define amb-fail '*)

(define initialize-amb-fail
(lambda ()
(set! amb-fail
(lambda ()
(error "amb tree exhausted")))))

(initialize-amb-fail)

(define-macro amb
(lambda alts...
`(let ((+prev-amb-fail amb-fail))
(call-with-current-continuation
(lambda (+sk)
,@(map (lambda (alt)
`(call-with-current-continuation
(lambda (+fk)
(set! amb-fail
(lambda ()
(set! amb-fail +prev-amb-fail)
(+fk 'fail)))
(+sk ,alt))))
alts...)
(+prev-amb-fail))))))

(define (assert pred)
(if (not pred) (amb)))

Map makers have known that only four colors are needed for country maps when the rule is that no two neighbors are shown with the same color. So if France is red, Spain can't also be red. This example uses just the states of New England. These states are chosen because there aren't that many of them.

#!/usr/bin/guile -s
!#
;;; Color the New England states in 4 colors.
(load "amb.scm")

(define (choose-color)
(amb 'red 'green 'blue 'white))

(define (color-new-england) ; Choose colors for each state.
(let ((me (choose-color)) ; MainE
(ma (choose-color)) ; MAssachuesetts
(ct (choose-color)) ; ConnecticuT
(ri (choose-color)) ; Rhode Island
(nh (choose-color)) ; New Hampshire
(vt (choose-color))) ; VermonT

;; Construct the adjacency list for each state:
;; the states' name, its color, the list of its neighbors' colors.
(let ((Connecticut (list 'Connecticut ct (list ri ma)))
(Maine (list 'Maine me (list nh)))
(Massachuesetts (list 'Massachuesetts ma (list ct ri vt nh)))
(NewHampshire (list 'NewHampshire nh (list ma vt me)))
(RhodeIsland (list 'RhodeIsland ri (list ct ma)))
(Vermont (list 'Vermont vt (list ma nh))))

(let ((states (list Connecticut Maine Massachuesetts
NewHampshire RhodeIsland Vermont)))

;; A state's color can't be the same as any of its neighbors.
(for-each (lambda (c)
(assert (not (memq (cadr c)
(caddr c)))))
states)

;; Output the color assignment.
(for-each (lambda (c)
(display (car c))
(display " ")
(display (cadr c))
(newline))
states)))))

(color-new-england) ; run the function to find an answer.

At the top, one notes that guile is invoked and runs the file. The last line calls the function just defined, and so produces the answer. Defining the function is not enough. This example also shows a Scheme convention used by Guile and the emacs editor scheme support module. Left aligned comments use three semi-colons. Comments that are indented with the code use two, and end-of-line comments use just one.

The load syntax is common to Scheme.

The definition of choose-color is very odd. It says that it is just amb with some colors. Remember that amb is a macro and expands, and move on for now. choose-color is then used in the assignment for abbreviations of the states. This is how the work gets done. Yet, we haven't told it what the rules are yet.

The adjacency list has longer names that eventually are printed, the abbreviation and a list of neighboring states. I may not have mentioned it. Lisp stands for LISt Processing. Scheme, a dialect of Lisp also does lists. A zillion operations are available for goofing around with lists. Scheme is good at lists.

Next comes a list of states, for convenience.

Then, a for-each asserts that the color of one state is not the color of a neighboring state in it's list. The cadr of a list is the second element, which is the state abbreviation. The caddr of the list is the third element, which in this case is a list - the list of neighboring states. The memq function returns true if the given element can be found in the given list. We never want to find our color in the list, so not is used.

Finally, the answers are printed in another loop. It takes a bit of work to see that both the state name and the state's assigned color are printed. You may be hard pressed to find an assignment to a list with the state and the color. My best guess is that the state is a function that returns said list. I'm pretty sure i haven't gotten to the bottom of this mystery.

This example does not take much time to run. But i goofed around with it to try to discover if it, for example, can easily be extended to all 48 contiguous states and the District of Columbia. It can't. At least, not without computers many orders of magnitude faster than the ones we have today, or lifetimes much longer than we enjoy, or both. To date, i've been able to color 17 states with this code. And that's pushing it. This decision tree has states with 1, 2, 2, 3, 3, and 4 neighbors. So the tree has 1 * 2 * 2 * 3 * 3 * 4 = 144 branches. If the average neighbors is 3, with 6 states, the compute time is roughly proportional to about 3^6 or 729. But with 15 states, the time is proportional to 3^15 = 14,348,907. So if this example takes a second, 15 states might take five hours. (Or it might find a solution on the first attempt.) But all 48 lower states plus the District of Columbia could take 10,401,835,608,364 years. That's approximately 759 times the current age of the Universe.

Yet map makers have been coloring non-trivial maps for a long time. I have a colored globe of the Earth, including the lower contiguous 48 states with the District of Columbia. How could they possibly do so by hand if the computer, executing billions of instructions per second, can't have yet finished? The answer is that this algorithm performs a depth first search of the solution space. The only optimization is that if it finds an answer, it stops and returns that answer. This is an optimization when one considers that this means that the entire decision tree may not be traversed. Some parts of it are skipped. There may be many answers. If so, it may find one sooner, on average, than searching half of the tree. But it isn't clear that if it assigns red to Maine and notices once that assigning red to New Hampshire that it won't try assigning these colors to this pair later. There are still larger parts of the tree that don't need to be searched. This program doesn't appear to know how to skip them.

One optimization that i've tried is changing the order in which states are assigned. For example, sorting New England states alphabetically is some seven times faster than the order i'd started with. See, i grew up in Massachuesetts, so i started there and branched out. But this algorithm works better if the states with few connections are listed first, and Massachuesetts is adjacent to the most states in New England. The reason the alphabetical order is faster is that it doesn't happen to need to try as many guesses before it stumbles on a correct solution.

If i were to do this by hand, I'd assign red to Massachuesetts. Connecticut could be green. Rhode Island could be blue. That takes care of the southern chunk of states. Vermont could be green, and New Hampshire could be blue. Maine could then be red. By taking this order, i didn't have to backtrack any guesses. That is, after accounting for the states that had colors chosen, and picking a color that wasn't already used, i was done with that state, and didn't need to go back. That's because i picked a color for the most constrained state, Massachuesetts, first. I didn't even need white. Indeed, at least under Guile, this script doesn't use white either.

I have to imagine that all states start out as red, and then get changed when it turns out that it doesn't work. I've no idea which states get changed when. If i come up with a rule which might reduce the size of the search tree, i've no idea how to tell this program this rule. The scheme tutorials suggest that it may be more difficult to write such a program in C. I've written this in C and was able to put in hints. And the result is a program that is a little longer, but generates colors for all 48 states and the District of Columbia in under a second. That's not because C is so fast, because even at a thousand times times faster, this should take the age of the Universe to finish. It's because my C version does less work.

When there are four choices, it can save alot of time if the most promising branch can be taken first. Just sorting the options can be a win. These experiments were done by trial and error, and statically, in the Scheme version, using small numbers of states, tuned so that it would take some time, but not too much.

Another issue is equivalent results. Starting with a red Massachuesetts means that i should never have to change it. Starting with a green Massachuesetts would be equivalent. The other colors would have to change, but it would be a simple substitution, red is green and green is red, will be fine. It would be nice to be able to avoid running down these redundant branches of the tree.

So, at the moment, i'm forced to the conclusion, that while this is an interesting technique, it's only viable for essentially trivial problems. This is a technique that only a mathemetician could love. Mathemeticians are notorious for wanting to be able to state that a solution exists, without having any care to actually obtain some solution. A non-trivial practical problem is coming up with an optimal schedule for 35,000 students, each taking 12 of 200 different courses on a campus with 300 rooms of various sizes and 250 professors. Brute force isn't going to work here.

Perhaps by learning the details of call/cc, expanding the amb macro to see the resulting code and examining how it really works, something might appear that will suggest optimization options. That would be magic beyond the Ordinary Wizarding Level.

2 comments:

dqkennard said...

There may be useful coloring algorithms at http://en.wikipedia.org/wiki/Graph_coloring

Stephen said...

Wow.

I thought i'd managed to drop my entire audience... i'll have to try harder.

Maybe commenting on the Wiki article will do it.