Wednesday, February 06, 2008

Scheme For Enlightenment Part Twelve - Quicker Colors

We last left our hero coloring states on a map. This time, a new version is written in C, but not a translation of the Scheme code. The Scheme version for the entire lower US is 150 lines of code. Much of this is representations of the states and which is near which. The C version for the same lower states is 187 lines long. A similar amount is the data, with the remaining extra code, some 42 lines, comprising more logic. But this isn't terribly fair to C. In Scheme, the data and the program are much more mixed. The data is executed. States become functions, and so on. In the C version, the data and the data structures are just there. And it's also not fair to Scheme. The C version has sort of two copies of the states, rather than three. But the new C version has the feature that it finishes exucution in a reasonable amount of time. Less than a second. Here it is.

#include
/* Color the New England states in 4 colors. */

#ifndef TRUE
#define TRUE 1
#endif
#ifndef FALSE
#define FALSE 0
#endif

char *colors[] = {
#define UNDEF -1
"red",
#define RED 0
"green",
#define GREEN 1
"blue",
#define BLUE 2
"white",
#define WHITE 3
NULL
};

#define NEIGHBORS 8 /* Max neighboring states. */
struct state {
int color;
int neighbors[NEIGHBORS];
char *name;
};

#define UD -1 /* UNDEF */

/* Adjacency is only checked once.
* So MA is next to CT, but CT doesn't also check for MA.
* Later states refer to earlier states only.
*/
struct state states[] = {
#define AL 0
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Alabama" },
#define AZ 1
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Arizona" },
#define AR 2
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Arkansas" },
#define CA 3
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "California" },
#define CO 4
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Colorado" },
#define CT 5
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Connecticut" },
#define DE 6
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Delaware" },
#define DC 7
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "DistrictOfColumbia" },
#define FL 8
{ UNDEF, { AL, UD, UD, UD, UD, UD, UD, UD }, "Florida" },
#define GA 9
{ UNDEF, { FL, AL, UD, UD, UD, UD, UD, UD }, "Georgia" },
#define ID 10
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Idaho" },
#define IL 11
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Illinois" },
#define IN 12
{ UNDEF, { IL, UD, UD, UD, UD, UD, UD, UD }, "Indiana" },
#define IA 13
{ UNDEF, { IL, UD, UD, UD, UD, UD, UD, UD }, "Iowa" },
#define KS 14
{ UNDEF, { CO, UD, UD, UD, UD, UD, UD, UD }, "Kansas" },
#define KY 15
{ UNDEF, { IL, IN, UD, UD, UD, UD, UD, UD }, "Kentucky" },
#define LA 16
{ UNDEF, { AR, UD, UD, UD, UD, UD, UD, UD }, "Louisiana" },
#define ME 17
{ UNDEF, { UD, UD, UD, UD, UD, UD, UD, UD }, "Maine" },
#define MD 18
{ UNDEF, { DC, DE, UD, UD, UD, UD, UD, UD }, "Maryland" },
#define MA 19
{ UNDEF, { CT, UD, UD, UD, UD, UD, UD, UD }, "Massachusetts" },
#define MI 20
{ UNDEF, { IN, IL, UD, UD, UD, UD, UD, UD }, "Michigan" },
#define MN 21
{ UNDEF, { IA, UD, UD, UD, UD, UD, UD, UD }, "Minnesota" },
#define MS 22
{ UNDEF, { LA, AR, AL, FL, UD, UD, UD, UD }, "Mississippi" },
#define MO 23
{ UNDEF, { AR, KS, IA, IL, KY, UD, UD, UD }, "Missouri" },
#define MT 24
{ UNDEF, { ID, UD, UD, UD, UD, UD, UD, UD }, "Montana" },
#define NE 25
{ UNDEF, { KS, CO, IA, MO, UD, UD, UD, UD }, "Nebraska" },
#define NV 26
{ UNDEF, { CA, ID, AZ, UD, UD, UD, UD, UD }, "Nevada" },
#define NH 27
{ UNDEF, { MA, ME, UD, UD, UD, UD, UD, UD }, "NewHampshire" },
#define NJ 28
{ UNDEF, { DE, UD, UD, UD, UD, UD, UD, UD }, "NewJersey" },
#define NM 29
{ UNDEF, { AZ, CO, UD, UD, UD, UD, UD, UD }, "NewMexico" },
#define NY 30
{ UNDEF, { CT, MA, NJ, UD, UD, UD, UD, UD }, "NewYork" },
#define NC 31
{ UNDEF, { GA, UD, UD, UD, UD, UD, UD, UD }, "NorthCarolina" },
#define ND 32
{ UNDEF, { MN, MT, UD, UD, UD, UD, UD, UD }, "NorthDakota" },
#define OH 33
{ UNDEF, { KY, IN, MI, UD, UD, UD, UD, UD }, "Ohio" },
#define OK 34
{ UNDEF, { NM, CO, KS, MO, AR, UD, UD, UD }, "Oklahoma" },
#define ORG 35
{ UNDEF, { CA, ID, NV, UD, UD, UD, UD, UD }, "Oregon" },
#define PA 36
{ UNDEF, { DE, OH, NY, NJ, UD, UD, UD, UD }, "Pennsylvania" },
#define RI 37
{ UNDEF, { CT, MA, UD, UD, UD, UD, UD, UD }, "RhodeIsland" },
#define SC 38
{ UNDEF, { GA, NC, UD, UD, UD, UD, UD, UD }, "SouthCarolina" },
#define SD 39
{ UNDEF, { ND, MN, IA, NE, MT, UD, UD, UD }, "SouthDakota" },
#define TN 40
{ UNDEF, { NC, GA, AL, MS, AR, MO, KY, UD }, "Tennessee" },
#define TX 41
{ UNDEF, { NM, OK, AR, LA, UD, UD, UD, UD }, "Texas" },
#define UT 42
{ UNDEF, { AZ, NV, ID, CO, UD, UD, UD, UD }, "Utah" },
#define VT 43
{ UNDEF, { MA, NH, NY, UD, UD, UD, UD, UD }, "Vermont" },
#define VA 44
{ UNDEF, { NC, TN, KY, DE, DC, MD, UD, UD }, "Virginia" },
#define WA 45
{ UNDEF, { ID, UD, UD, UD, UD, UD, UD, UD }, "Washington" },
#define WV 46
{ UNDEF, { VA, KY, OH, PA, MD, UD, UD, UD }, "WestVirginia" },
#define WI 47
{ UNDEF, { MI, IL, IA, MN, UD, UD, UD, UD }, "Wisconsin" },
#define WY 48
{ UNDEF, { CO, UT, ID, MT, ND, SD, NE, UD }, "Wyoming" },
};

/* Does this state have any adjacent states of the same color? */
int
adjstate(int st)
{
int adj;

for (adj = 0; adj < NEIGHBORS; adj++) {
if (states[st].neighbors[adj] != UNDEF &&
states[st].color == states[states[st].neighbors[adj]].color) {
return TRUE;
}
}
return FALSE;
}

/* Return TRUE if found a color that works. */
int check_states(int st)
{
int color;

/* Start color where you left off. UNDEF + 1 => RED. */
for (color = states[st].color + 1; color <= WHITE; color++) {
states[st].color = color;
if (!adjstate(st)) {
return TRUE;
}
}
return FALSE;
}

int main()
{
int st;
long rework = 0; /* stats: Rework=158727 */

st = 0;
while (st <= WY) {
if (!check_states(st)) { /* If the state doesn't work */
states[st].color = UNDEF; /* revert */
st--; /* Try the next state again. */
rework++;
} else {
st++;
}
}
/* Print the answers. */
for (st = 0; st <= WY; st++) {
printf("%s = %s\n", states[st].name, colors[states[st].color]);
}
printf("Rework=%ld\n", rework);
return 0;
}

There are some warts i've left in. It didn't turn out that i needed defines for the color table indexes. All i needed to know was the total number of colors, which i could have gotten from sizeof. This would allow the number of colors to be changed, but also would shorten the program by a few lines. In the structure for the states, i could have used a variable length neighboring state list. This would likely increase the number of lines of code a little, but i wouldn't have to know the length of the largest list before starting. In fact, the longest list has 7, not 8 entries. I didn't bother to trim the list. I might decide to add Canada's provinces, for example. That could possibly increase the longest list. The adjstate function is only used once, and can be inserted where it is called. That would also shorten the program. But breaking it out as a function no longer implies a speed penalty, as the compiler can be asked to do this inlining automatically. And having it be a function essentially documents what that chunk of code does.

One difference in the data between Scheme and C is the list of adjacent states. If New Hampshire is next to Vermont, then Vermont is next to New Hampshire. So this relationship only needs to be spelled out once. If it is spelled out twice, it will be checked twice. If it is checked twice, then perhaps the decision tree is larger. In the C version, the state indexes, ME is Maine, and so on, are not defined until the state is defined. That means that no state can refer to adjacent states that come later in the list. That means that i was able to use the C compiler to reduce the list. The original list was the redundant one that i crafted from a map for Scheme. But since TN for Tennessee wasn't defined, the compiler complained when i attempted to use it as an adjacent state to Alabama. No problem. Just change the entry to UD (Undef), and move on. So, you'll see many more adjacent states at the end of the list. For example, Tennessee includes AL for Alabama when it shows up later.

The check_states function assigns a color to a state, and checks to see if any neighbors have that color. The color assigned is the next color available. If there was no color, it starts at the begining of the color list. But it potentially checks all colors. If it finds a color that none of the neighbors have, it stops and returns TRUE. If no colors work for that state, it returns FALSE.

The real difference in code between the Scheme version and the C version is in main. The while loop really traverses the decision tree. And it does it by pruning the tree as it goes. It considers one state at a time, starting with exactly one state. The while loop in main does this wierd walk up and down the list of states. The list starts with all state colors assigned UNDEF. And the first state that is checked is passed to check_states, which assigns it red, checks to see that this works, and returns TRUE. Even in Alabama had any adjacent states listed, these states would all have their color listed as UNDEF, so TRUE would be returned. Then it checks the next state in the list. If it gets to a state that doesn't work, that is none of the possible colors work, it sets that state's color to UNDEF and backs up to reconsider the previous state. The previous state's color is advanced to the next color, since the color it used really doesn't work when future states are considered. And the checking goes on from there. Now it may be that no other colors work for this state, and it will back up some more. But note that when the program needs to back up, it ends up skipping the consideration of the rest of the tree. All the remaining state's combinations and permutations of colors are skipped. This is a real time savings. Because instead of processing something like 328,256,967,394,537,077,627 nodes of decision tree, only 158,727 states were reconsidered. If these reconsiderations all needed 4 colors considered, then 634,908 state colors would be recomputed. I'm sure it was less than that. But in any case, it is much smaller. Actually, map makers must have other tricks (heuristics), because doing anything like 158,727 decisions to make a map by hand would take a long time.

This algorithm isn't taken from the Wikipedia article. That would be, uhm, math. It's adapted from the eight queens problem. That problem is to place eight queens on a chess board so that non of them are attacking any others. Since queens attack in columns, one can't have two queens in the same column, so a queen's position can be represented as her height in her own column as a number. Since there are eight columns of eight, then there are 8^8 = 16,777,216 positions to consider. One could just count in octal and check each number. On modern machines, this may be practical. But if instead, one advances just each queen to the next good move, most of the 16 meganodes don't have to be checked. This problem required less than a tenth of a second on a 1980 PDP-10. Modern desktops are about 10,000 times faster.

That this C version can be written in Scheme is clear. All magic that is used is available in Scheme. As a novice to Scheme, writing more programs can only be instructive. It wouldn't be much of a benchmark, because no matter how fast or slow the Scheme version is, the C version does not take long enough to get an accurate timing. But perhaps a sufficiently slow machine could be used. I have a 486/33... but even then it would only benchmark languages, not hardware/software systems.

No comments: