## Thursday, February 21, 2013

### Eight Queens part one

Eight Queens part one
 Q Q

If you get out a chess board, and put a queen in the lower left square, it's pretty clear that you can't put a second queen in the lowest row of the second column. That's the same row that the first queen is in. So there are really only seven choices available for the second queen. And the third queen only has six choices. It follows that there are 8 * 7 * 6 ... or 8! = 40,320 possible board positions to check for solutions. So combining the not-in-the-same-column rule with the not-in-the-same-row rule reduces the number of board positions to check by a factor of (8^8)/8! = 416.

There are algorithms that turn a counting integer into a unique permutation of a list. One could use this sort of thing. There are three reasons for not moving forward in this direction. First, the permutation algorithm is a bit complicated. It's nearly as long as the current program. Second, this permutation algorithm is fairly slow, though not anywhere near a factor of 416 slow. And its performance per call is proportional to the number of items to permute, in this case eight. Third, changing an exponential algorithm (8^8) into a factorial problem (8!) still gives you exponential time. If the board size is increased, the solution increases in time proportional to x^n of the board size. The permutation algorithm is handy. So perhaps it'll get a write up in some other series. This series has another direction. And if this code is converted to counting permutations, then it will be difficult or impossible to go where we're going.

 Q Q Q Q Q

To make use of Backtracking you need to be able to generate a first board, a next board (and know if there aren't any more), you need to be able to check the partial solution, and you need to be able to print or otherwise identify the solutions you found.

Let's see how it works. The first board routine needs to simply place the first queen. In the check board routine, there's only one queen, so there are no attacks. The next board code knows that there is a possible solution, so it puts a new queen on the board in the next column. From here on, it's only check board and next board. The check board function needs to know how many queens are on the board so it can check rows and diagonals for attacks. In this case there are two queens, and they're in the same row, attacking. The next board function needs to know if the check board found an attack or not. Since there's an attack, it needs to move the rightmost queen up a row. When it finds a queen is at the top already, then it is removed, and the previous queen needs to be moved up. This is Backtracking.

For the Eight Queens problem, the initial solution space was so large, that it seemed hopeless. But it turns out that Backtracking is an approach that can be used to attack many of these combinatorial problems. The downside for Backtracking seems to be that it isn't very easy to decide how good the performance will become without first trying it. In this case, as queens to the right of the board are considered, smaller chunks of the problem space are eliminated. With increasing board size, does the problem stay exponential? Or is it something else? Knuth wrote about this issue in 1975. From my perspective, it doesn't matter. That's because Knuth is primarily thinking about the problem as a mathematician. Mathematicians want to think about a problem and come up with a solution. Programmers have another tool. They can write yet another program and see how it works. If the program takes too long, the programmer can tell the computer to count something of interest to help the analysis. Often this shows the programmer where to attack the problem next.

Don't agree with me? Here's an analogy. My favorite sorting algorithm is Shell's Sort. How long it should take is an open question. One advantage that it has over other algorithms is that an in-place sort doesn't require extra memory. It usually performs similar to n*log(n). In practice, it's usually quicker than Quicksort which has a best case of n*log(n) time, but requires more space. It doesn't bother me that its computation time is an open question.

In any case, let's look at a modification to the 8^8 version of the Perl program that does Backtracking. It's instrumented to find out how many rows needed to be checked (which is the number of next boards, and the number of diagonals checked. One should note that the board checks are faster than the previous version. That's because instead of checking all eight queens against all eight queens, it only has to compare the most recent queen to other queens. The previous queens have already been checked, and none are attacking. Since only 15,720 boards were checked, that's less than one in a thousand compared to the 8^8 solution. Fewer than one in three thousand diagonals need be checked. What does that mean for the order of computation? Not much. This code is written for an 8x8 board. It could be generalized to nxn, and then run, and then we'd find out... something. Since the checks are proportional to the size of the board, we'd expect time to rise faster than board size. However, the chunks that would be eliminated would be a larger proportion for larger boards. So, who knows? I'd be willing to bet that someone has written this up somewhere.

```#!/usr/bin/perl

# Eight queens - incremental backtracking version.
# Chess board where eight queens are not attacking each other.
# Can't have two queens in the same column, so represent row numbers in columns.
# There are 8^8 (1e7) positions to check.
# The check for one queen per row gives you unique digits.
# If there's an attack between queens to the left, there's no
# need to keep checking to the right.
# This reduces the problem further.
# 92 winning boards. 15720 rows checked, 5508 diags checked.
# 0 seconds - likely less than 0.05 seconds.

# main
my \$col = 0; # current column under consideration
my \$x;
my @b; # board, 0 - 7 are columns, values are positions, 0-7
my \$cnt = 0;
my \$chkh = 0; # horizontal checks
my \$chkd = 0; # diagonal checks

# set board to queens at bottom.
for (\$x = 0; \$x < 8; \$x++) {
\$b[\$x] = 0;
}
while (1) { # The big loop, count to 8^8-1
# check for a win
if (&chkwin(\$col)) {
if (\$col != 7) {
\$col++; # Don't increment board, as new col is zero.
} else {
\$cnt++;
#  print "win ";
&prbrd();
\$col = &incbrd(\$col); # increment the board
if (\$col == -1) {
print "\$cnt winning boards.\n";
exit(0);
}
}
} else {
\$col = &incbrd(\$col); # Increment the board
if (\$col == -1) {
print "\$cnt winning boards. \$chkh rows, \$chkd diags.\n";
exit(0);
}
}
}

sub incbrd { # Returns \$col. -1 when done.
my \$col = shift;

\$b[\$col]++;
while (\$b[\$col] > 7) {
if (\$col != 0) {
\$b[\$col] = 0;
\$col--;
\$b[\$col]++;
} else {
return -1; # done
}
}
return \$col;
}

sub prbrd { # Print the board
my \$x;

for (\$x = 0; \$x < 8; \$x++) {
print "\$b[\$x]";
}
print "\n";
# sleep(1);
}

# Assumes partial incremental board.
# Only have to check the rightmost column.
sub chkwin {
my \$col = shift;
my (\$x, \$y, \$a);

# Check for same row.
\$chkh++; # How many horizontal checks.
for (\$x = 0; \$x < \$col; \$x++) {
if (\$b[\$x] == \$b[\$col]) {
return 0;
}
}
# Check for diagonal.
\$chkd++; # How many diagonal checks.
\$a = 1;
\$y = \$b[\$col];
for (\$x = \$col - 1; \$x >= 0; \$x--) {
if ((\$y == \$b[\$x] - \$a) ||
(\$y == \$b[\$x] + \$a)) {
return 0;
}
\$a++;
}
return 1; # 1 is good, 0 is not a win.
}
0;
```

Could this code be made faster? Sure. For one thing, any solution that is up/down mirrored is also a solution. So the left most queen only needs to go from 1 through 4. The mirror solution can be created by subtracting the row number for each column from nine. So one becomes eight. It's an easy change. It would produce all the answers in a little over half the time. Perhaps tricks like that allowed someone to compute the number of distinct solutions for a 26x26 board. Time spent counting alone would take forever. One expects that they used a compiled language. That would improve performance by a factor of about three hundred. This code could be converted to C without changing much of the look. And it could be converted to run on multiple processors fairly easily.

The interested reader might note that the general algorithm for Backtracking is presented as recursive, and the Eight Queens code that Wirth presented is also recursive. My Perl solutions are not. My policy on recursion is to introduce it if it helps with dynamic storage. In this case, restoring the previous state is a simple matter of changing the value of the column under consideration (\$col). A further issue is parallelization. Multiple core processors are now common. We should be considering parallel computations. It looks pretty easy to start with my code and modify it for multiple processors. For example, it could be modified to consider the case where the first queen is set once, and not moved. All of the other boards with this first queen would be considered. Eight instances would be run, possibly at the same time. Perhaps something similar could be done with a recursive version, but it may not be so easy. It's something for future investigation.

This code was timed with a resolution of one second. Why work hard to reduce a fifty second program you need to run once to nearly zero? The answer is that these techniques are worth learning. There are other problems where even modern machines are hopelessly slow. We'll get to one of those soon.

## Wednesday, February 20, 2013

### Eight Queens part zero

How many queens can be put on a chess board where none of them are attacking any others?

A chess board has eight by eight squares. A queen attacks all squares in the same row or the same column, or on either of the two diagonals.

If one considers putting queens on a chess board at random, the maximum number of board positions to consider is 64! That is, one puts a queen on the board, and there are 64 choices for where to put her. For the second position, there are only 63 open squares left. So for two queens, 64 * 63 positions need to be considered. There are 64 squares, so the maximum number of queens is 64. The number of positions to check is 64 * 63 * 62... or a total of 64!. That's about 10^89 positions. All the computers on Earth could not check that many board positions in the current age of the Universe. The overwhelming majority of these board positions can be shown to have at least two queens attacking each other. And there are some simple ideas to eliminate whole chunks of these at a time. For example, here's one idea.

Since a queen attacks all the squares in the same column, one can't have two queens in the same column. Since there are only eight columns on a chess board, it's not possible to have more than eight queens on a chess board without any attacking any others. It can be ruled out. That doesn't mean that there are any board positions with eight queens. It only means there aren't any with nine or more. This is apparently obvious enough that the problem is called the Eight Queens problem.

This not-in-the-same-column rule means that the queen in the first column can be in any of eight positions. For each of those, the queen in the second column can be in any of eight positions, for eight squared combinations. It follows that the total number of board positions to check is eight to the eigth power (8^8), or a bit over 16 million board positions. This is a speed increase over our original idea of a factor of about 10^81.

The Eight Queens problem, is an example of a combinatorial problem. It is said to be NP-complete. The solutions for these problems suggest that the entire solution space must be searched to find solutions. NP-complete problems strike fear in the hearts of computer science students. Having never been a computer science student, it's not much of a worry. However, my understanding of gravity is excellent, so acrophobia is available. It's always something.

While at school in 1979, i had a job sitting behind the I/O desk, answering student's computer questions. One day, an advanced computer science student asked me about the problem. The professor had likely read Dijkstra's 1972 article, and assigned it to a class. He prepared the class for the worst by implementing an 8^8 solution that attempted to get all solutions by brute force. After three days, it had covered a third of the solution space, and he killed it. The estimate was nine days for the full solution. This is on a PDP-10 designed in 1966 that performed at about a fifth of a MIPS.

Anyway, on a modern machine, 10^7 is not such a big number of things to do, so let's just plow into it. How should a computer program represent the board? It could have an eight by eight grid, with a one representing a queen, and a zero representing an open space. Since the numbers are either zero or one, the numbers don't have to be larger than one bit. 64 bits could be used.

A program with this representation needs code to generate a first board position, code to generate the next board position in any way that covers all possible board positions, code to check to see if a board position is a solution, and code to print or otherwise note a solution. One can imagine each of these routines. The first board position could be all eight queens at the bottoms of their columns. The next board position could move the rightmost queen up a row. But if that move the queen off the top of the board, it should move it down to the bottom of that column, and move the queen to the left up a row. When all the queens are at the top, it should report that all positions have been searched. The check for win must check each queen's row for other queens, and each queen's diagonals for other queens. A board position print might print a space where there are zeros and a queen where there are ones in an eight by eight square.

Here's another way to represent a board position. In each column, only one queen may be present. So each column can be represented by a single number. This number says what row that queen is in. There are only eight columns, so there are only eight numbers. Each column has eight rows. So each number can have one of eight values, so only three bits are needed. That's a total of 24 bits. In any case, you can think of each column as a digit in a number. As long as the digits are constrained to one of eight unique digits, you can count through all possible board positions. One of the nice features of this representation (compared with a bit per square) is that as you "count" through possible board positions, you don't have to check that two queens are in the same column. That is, the board representation can't violate this constraint, so there's no need to check.

The same routines are needed for the eight numbers as above. If each digit is one through eight, then the intial board could be all ones. The next board looks like counting by one. Checking for a queen in the same row is the same as checking the other queens for the same digit number. Checking diagonals is a bit more complicated. Two columns that are next to each other have a diagonal that are off by one row. If they're one row farther apart, then diagonals are two rows different. Finally, though a full board can be printed, it's easy enough to simply print the row numbers for each column.

This 8^8 Perl program takes 50 seconds to find all solutions. That's because a modern desktop machine is at least 15,000 times faster.

```#!/usr/bin/perl
# Chess boards where 8 queens are not attacking each other.
# Can't have two queens in the same column.
# Represent row numbers in columns.
# There are 8^8 (1e7) positions to check.
# 50 seconds - 335544 board checks per second.

# main and global variables
my \$col = 0; # current column under consideration
my \$x;
my @b; # board, 0 - 7 are columns, values 0 - 7 are positions
my \$cnt = 0;

# Set board to queens at bottom.
for (\$x = 0; \$x < 8; \$x++) {
\$b[\$x] = 0;
}
while (1) {
if (&chkwin()) { # check for a win
\$cnt++;
&prbrd();
}
if (&incbrd() == -1) { # increment the board
print "Done \$cnt winning boards.\n";
exit(0);
}
}

sub incbrd { # Increment board.
my \$col = 7;

\$b[\$col]++;
while (\$b[\$col] > 7) {
if (\$col != 0) {
\$b[\$col] = 0;
\$col--;
\$b[\$col]++;
} else {
return -1; # Puzzle is done
}
}
return \$col;
}

sub chkwin { # Check a board for a win.
my (\$x, \$y, \$a);

for (\$x = 0; \$x < 8; \$x++) { # Check row
for (\$y = \$x + 1; \$y < 8; \$y++) {
if (\$b[\$x] == \$b[\$y]) {
return 0; # not win
}
}
}
for (\$x = 0; \$x < 8; \$x++) { # Check diagonals
\$a = 1;
for (\$y = \$x + 1; \$y < 8; \$y++) {
if ((\$b[\$y] - \$a == \$b[\$x]) ||
(\$b[\$y] + \$a == \$b[\$x])) {
return 0; # not win
}
\$a++;
}
}
return 1; # win
}

sub prbrd { # Show the board.
my \$x;

for (\$x = 0; \$x < 8; \$x++) {
print "\$b[\$x]";
}
print "\n";
}
0;
```

The Eight Queens wiki page tells us that the first solutions were found in 1850, about 96 years before the first fully functional computer, depending on who you talk to, and what your definition of fully functional computer is. If any solutions can be found by hand, there must be better ways to approach this problem.

My help to the student was likely limited to some hand waving about reducing the problem space, without any real direction. A couple days later, he showed me that he'd reduced the time required to less than a second. He even removed the print statements showing that most of that time was spent printing the answers. There are only 92 of them.

In the next part, the problem is approached using a better technique. It may seem pointless, since the most time that can be saved is about fifty seconds. But the general technique can be used on many such problems. And many of these problems are considerably more complicated. It's best to practice on easier problems first.

## Monday, February 18, 2013

### Crimson

It was mostly clear last night. I thought it might be fun to see if i could find R Laporis, also called Hind's Crimson Star. Though visible from my back yard, which has better stray light blocking, it did not appear that i could get my telescope's computer aligned in a spot where i could see it. So, i brought the scope to the front sidewalk (the Mercury Vapor Observatory). The MVO has had a much wider sky view than the jungle in the back yard ever since the Emerald Ash Borer took down the front yard's twin gorgeous ash trees. I've never actually seen one of these beasts.

The constellation Lepus (the hare) is below the feet of Orion. This part of the sky looks entirely star free at first glance. 10x50 binoculars show the stars of the constellation easily. But after a few minutes, there seemed to be enough stars visible to the naked eye to sort of fill it in. At the MVO, there really isn't anything like dark adaptation for the eyes. It's more like scanning the area, paying attention, and using averted vision techniques to bring out the dimmer stars. I should note that the constellation looks more like LEPUS on the Orion wiki page than on the R Laporis wiki page.

The ten inch scope cooled down, and i performed an alignment using Rigel and Polaris. Rigel was chosen because it's near the target. Polaris was chosen because it was handy. The computer reports how good the alignment was, and it was excellent. R Laporis is less than a degree above NGC 1710. NGC 1710 is a galaxy that is absolutely invisible from the Mercury Vapor Observatory, since it's not one of the three brightest galaxies in the sky. The idea was to get into the neighborhood. It took about a minute of searching to find it. My wife also got a look at it briefly, before clouds blocked it. It looks like a seriously red LED light. Orion's Betelgeuse is also a red supergiant. And it's nearby in the sky, so i took a look at it for comparison. While Betelgeuse is somewhat orangish naked eye, it looked positively white in the scope, though not white like Rigel. The difference for R Laporis is that the star pushes carbon into its atmosphere. This carbon blocks blue light from the otherwise red giant, making it look much, much redder.

R Leporis is fairly faint, at about 10th magnitude. This is not a problem at the MVO. This ten inch (254 mm) scope has a collecting area (with the secondary mirror subtracted) of 47,553 square millimeters. My eyes dilate to 6.5 mm, which works out to 33 square millimeters. So the scope brings in 1,433 times more light into my eyes. That works out to a light grasp improvement of about 7.9 magnitudes. I was seeing 4th magnitude stars naked eye, so 12th magnitude should have been reachable with the scope. In addition to that, even low magnification (48x) tends to dim extended objects like galaxies and the sky glow, but not point like objects like stars. So the contrast is much, much better. A tenth magnitude star is fairly bright in the scope. Unfortunately, I didn't get a chance to see if R Laporis was visible in the 9x50 finder scope. It wasn't obvious in the 10x50 binoculars, but i didn't yet know exactly where to look.

R Laporis is a variable star with a period of about 432 days (14 months). It most recently peaked near magnitude 6 in November 2012. So it should be faintest in June 2013 near magnitude 11. Here's the current AAVSO light curve. To search for the star, use R LEP on the AAVSO web site. R Laporis should be at peak brightness again in January 2014 (about 40 times brighter). And, Orion, and therefore Lepus, will be well placed in the evening sky as well.

It was freezing cold, so i limited myself to a few other objects. Polaris and Rigel were alignment stars. Though i could have aligned using the finder scope's cross hairs, the main scope was used. Betelgeuse, Rigel and Sirius were used for comparison. Sirius was blinding. Jupiter was above the first quarter moon. The four Galilean moons were clearly visible in 10x50 binoculars. I didn't look at the Moon with optics, as the sky was very clear, and the moon filter was in the house.

## Thursday, February 14, 2013

### Forth for Enlightenment, part thirteen of ten, Primes too

In part nine we had an introduction to testing if a number is prime. A function to check for primality was presented. It's short and to the point. And, since it checks by dividing the number by all integers up to the square root of the number, the execution time is proportional to the square root of the number. Is there something that can be done to speed it up? Absolutely.

For one thing, all numbers greater than two are divisible by two. So dividing by those numbers doesn't have to be done. One can test for even numbers by starting with three and skipping every other number. That's a quick and easy test. One can apply the same logic and skip numbers divisible by three after three. That's a little more complicated. The easiest approach appears to be to note that you can skip every other division by three, since those are divisible by two. That is, one can skip by six, and only check some of the numbers in each block. This code performs this sort of check. It searches two and three, then by sixes starting with five.

```«
0 0 → n s a «
IF n 2 < THEN
1 'a' STO
ELSE
IF n 3 > THEN
IF n 2 MOD 0 == THEN
2 'a' STO
ELSE
IF n 3 MOD 0 == THEN
3 'a' STO
ELSE
IF n 7 > THEN
n √ FLOOR 's' STO
5 s FOR x
IF n x MOD 0 == THEN
x 'a' STO
s 'x' STO
END
IF n x 2 + MOD 0 == THEN
x 2 + 'a' STO
s 'x' STO
END
6 STEP
END
END
END
END
END
a
»
» 'ISP2' STO
```

Here are some timings to show the speed up. On the left is the number that is tested. Then, the number of seconds for ISP and ISP2 to complete the check. Note that all numbers tested are, in fact, primes. So this is sort of the worst case. There was no run of 10000019 for ISP.

NumberISPISP2
1000752.3
100003155.4
100000347.315.5
10000019na47.7

In any case, it's pretty clear that as the number goes up, so does the time. Of note is that ISP2 performs about as well as ISP does on numbers that are ten times larger. That's because both algorithms have time that goes up with the square root of the number. The ISP2 implementation is faster because instead of checking every number, it checks two numbers for every six. That's one third the number of divide checks. Since it does one third as much work, it should perform about as well on number 3 * 3 = 9 times larger. Nine is pretty similar to ten from a benchmark perspective.

This factor of three performance increase comes at a cost. For the HP-28c, this cost is size. While ISP only requires 82 bytes, ISP2 requires 443 bytes. Is that a big deal? Yes. For one thing, since this is a single big function, if you make a mistake and need to edit it, you get "No room to enter". You have to type the whole function in again from scratch.

Are there prime number test algorithms that do better than going up in time with the square root of the number? Yes. But they either require remembering stuff, or have more code, or both. The HP-28c simply doesn't have the memory to do a significant sieve or the code for the Miller-Rabin primality test.

## Wednesday, February 06, 2013

### Forth for Enlightenment, part twelve of ten, Four Digit Puzzle

This is another number based logic puzzle. It can be solved without a computer. It's not difficult. The mostly brute force coded solution is straight forward too. It's included because of the rewrites. Like the previous puzzle, individual digits need to be examined. An experiment for this code is to use strings to extract digits. First the problem statement.

What is the four digit number in which the first digit is one-third the second, the third is the sum of the first and second, and the last is three times the second?

First, just a minor bit of analysis. We're looking for a four digit number. One imagines that four digit numbers have a range from 1000 to 9999. I mean, what's the difference between 0000 and 00000? Is the later a five digit number? But 0000 is a solution, right? This program assumes 1000 to 9999, but to save time, it stops when it finds a solution. The code 9999 'X' STO sets the variable X to its final value, so that the FOR loop ends. Without this, the program takes well over an hour to execute, and finds only the one solution.

```«
1000 9999 FOR X
X →STR 1 1 SUB STR→ 'A' STO
X →STR 2 2 SUB STR→ 'B' STO
X →STR 3 3 SUB STR→ 'C' STO
X →STR 4 4 SUB STR→ 'D' STO
IF A 3 * B ==
A B + C == AND
B 3 * D == AND THEN
X
9999 'X' STO
END
NEXT
'A' PURGE 'B' PURGE 'C' PURGE 'D' PURGE
440 .1 BEEP
» 'PUZ4' STO
```

This code requires 307 bytes of memory, and takes and 225 seconds for execution. It produces the right answer, 1349.

So, X →STR takes the number X, and converts it to a string. 1 1 SUB returns the substring that is the first character. Since more arithmetic needs to be done, STR→ converts it back to a number. Once each digit is isolated, it is assigned to a global variable. The number, X or ABCD, is a candidate solution. The IF statement figures out if ABCD is a solution or not.

The IF statement performs each of the tests. A 3 * B == multiplies A * 3, and compares it to B. A has to be an integer from 0 to 9. And so does B. One can multiply A by 3 or divide B by 3 and get the same result. The next bit, A B + C == AND adds A and B and compares the result to C. The AND bit combines the first two logic results. The result of the AND is true only if both of the conditions are true. Finally, B 3 * D == AND multiplies the second digit by 3 and compares that to the 4th digit. AND combines the result with the first two results.

The code 440 .1 BEEP makes the machine beep at the end. That makes it easier to time the execution with a stopwatch.

The next idea explored is saving X →STR on the stack, instead of recalling X and converting it to a string 4 times. The code X DUP DUP2 puts 4 copies of X converted to a string onto the stack. One copy is consumed in each of the following lines.

```«
1000 9999 FOR X
X →STR DUP DUP2
1 1 SUB STR→ 'A' STO
2 2 SUB STR→ 'B' STO
3 3 SUB STR→ 'C' STO
4 4 SUB STR→ 'D' STO
IF A 3 * B ==
A B + C == AND
B 3 * D == AND THEN
X
9999 'X' STO
END
NEXT
'A' PURGE 'B' PURGE 'C' PURGE 'D' PURGE
440 .1 BEEP
» 'PUZ4' STO
```

This version is smaller, using only 291 bytes. It's also faster, taking 190 seconds.

Using strings to extract digits was an experiment. We had a numeric digit extractor called G in our previous number puzzle. Here it is.

```«
DUP 10 / FLOOR SWAP 10 MOD
» 'G' STO
```

The G function takes a number, and produces the number divided by ten, and the remainder. This peels off the least significant digit. So the right most digit, D is obtained first. G is then called with the new smaller number to peel off the next digit. It doesn't get called for the last digit, since we know that the number had four digits. The third call to G must produce the first two digits. The program that uses it looks like this:

```«
1000 9999 FOR X
X G 'D' STO
G 'C' STO
G 'B' STO
'A' STO
IF A 3 * B ==
A B + C == AND
B 3 * D == AND THEN
X
9999 'X' STO
END
NEXT
'A' PURGE 'B' PURGE 'C' PURGE 'D' PURGE
440 .1 BEEP
» 'PUZ4' STO
```

It's slightly bigger than our optimized string version at 306 bytes of memory. But it's faster, taking only 103 seconds. The substring idea looked like it might be quicker, but it wasn't.

One thing of note is the PURGE commands towards the end. Global variables are convenient, but they've got to be cleaned up at the end, otherwise, they stay around for the user to clean up. This isn't much of a big deal on the HP-28c, but on the HP-28s, there's much more memory, and more than one program might be loaded at a time. These variables would add clutter everywhere. Local variables get cleaned up automatically. And we learned how to use them in a recent post. However, we need to change the G function to produce the digits on the stack all at once. That is, it leaves the shortened number on the stack where it can be used to produce the next digit. All the digits produced stay on the stack until converted to local variables.

```«
DUP 10 MOD SWAP 10 / FLOOR
» 'G' STO
```

Then the program follows.

```«
1000 9999 FOR X
X G G G
→ D C B A «
IF A 3 * B ==
A B + C == AND
B 3 * D == AND THEN
X
9999 'X' STO
END
»
NEXT
440 .1 BEEP
» 'PUZ4' STO
```

This local variable version is shorter, using 266 bytes and faster, taking 69 seconds. I'd have thought this version would be slower, since variables are created and destroyed every loop. So, not only is this version smaller and faster, but it looks easier to read to me.

The code → D C B A « takes the four digits that are on the stack, assigns them to local variables D, C, B and A. Local variable blocks can go anywhere.

Curiously, in RPL, variables of any kind must be given an initial value. It's impossible to have a variable that isn't initialized. Using a variable that doesn't yet have a value is simply not a bug you can code in this language. Fortunately, there are lots of other kinds of bugs you can code, more than making up for this deficiency.

The next idea is to try nested IF statements. Nested IFs are logically the same as the AND statements. They should be quicker since the tests can stop as soon as one is false. The C language allows shortcut logicals where additional conditionals are not evaluated. RPL on the HP-28c does not.

```«
1000 9999 FOR X
X G G G
→ D C B A «
IF A 3 * B == THEN
IF A B + C == THEN
IF B 3 * D == THEN
X
9999 'X' STO
END
END
END
»
NEXT
440 .1 BEEP
» 'PUZ4' STO
```

This version is larger at 296 bytes, but it's also faster at 54 seconds. Is it easier to read? Experimentation with the order of the IF statements did not yield a change in speed. One might expect that putting the condition that is false most often first would be faster, as less code is executed on average.

Finally, we'll inline the G function. The HP-28c's RPL encourages using and calling functions, but it takes time to call a function and return. The resulting code is bigger since the G function is 49 bytes by itself, and this version has three copies of it. It's not necessarily easier to read, but it is quicker.

```«
1000 9999 FOR X
X
DUP 10 MOD SWAP 10 / FLOOR
DUP 10 MOD SWAP 10 / FLOOR
DUP 10 MOD SWAP 10 / FLOOR
→ D C B A «
IF A 3 * B == THEN
IF A B + C == THEN
IF B 3 * D == THEN
X
9999 'X' STO
END
END
END
»
NEXT
440 .1 BEEP
» 'PUZ4' STO
```

This version requires 334 bytes, but comes in at 47 seconds. That's about 4.8 times faster than the first version. Is that as fast as it can be done? Of course not. For one thing, the problem can be solved without a program, and it doesn't take very long. This exercise is about optimization. In this case, we didn't know the relative speeds of different functions before we started. So, a black box trial and error system was used to deduce what turns out to be fast or small. For RPL, this is what it takes. But it often takes something like it in other languages. You might think that instruction speed in assembly language would be well known. But most modern processors sometimes can perform instructions while waiting on external memory, or can even execute multiple instructions simultaneously. And often, one needs to sign a non-disclosure form to get the documentation. If that's not enough, various levels of cache miss or TLB thrashing can toss your best estimates out the window. Trial and error has the advantage of trial, which is sort of a final arbiter anyway. So make your best guess, and ask the universe if you're right.

## Friday, February 01, 2013

### Forth for Enlightenment, part eleven of ten, Number Problem Nine

Forth for Enlightenment, part eleven of ten, Number Problem Nine

I came across this logic problem. It uses numbers. I was able to solve it without the aid of any kind of computer. And there are computers that are way faster, and easier to use than my 1986 vintage HP-28c programmable calculator. But it looked as if a program to solve this just might fit on the beast, if barely. Also, it looked as if the run time to reach a solution might be less than hours. This machine isn't speedy. The processor has a cycle time of 640 KHz. Modern desktop processors are measured in GHz. There are 1000 KHz per MHz, and a 1000 GHz per MHz. Call it a challenge.

The following multiplication example uses all the digits from 0 to 9 once and once only (not counting the intermediate steps). Finish the problem. One digit has been filled in to get you started.

abc * d5 = efghi

Before we get started on the RPL program, let's do some minor analysis of the problem. The least significant digit produced by the 5 and c must either be 0 or 5. But if it's 5, then it's a duplicated digit. So digit "i" must be 0. That means that 'c' must be even. In fact, this program uses 'E' for Even instead of 'c'. The program doesn't have to guess either the 5 or the 0. Instead of a ten digit problem, it's an 8 digit problem.

ABE * D5 = STUV0

The program could cycle from 0 to 99999999. However, an empty loop that long takes about 8.8 days to execute. Since digits are non-repeating, and 0 isn't needed, the loop can start ABEDSTUV from 12346789 and go through 98764321. That brings it down to 7.6 days. The permutations of 8 digits that can only hold 8 distinct values is 8! = 40320. That's a loop that takes five minutes. But even there, it really only has to generate the permutations of the digits A, B, E and D from 1 through 9 excluding 5, multiply ABE * D5 and check to see if STUV also avoids 0, 5 and any of the digits A, B, E and D. And that's how this program does it.

Before you enter the program, you should know that i made numerous mistakes entering it into the machine. My first attempt had one huge function. When i finished entering it, i got "NO ROOM TO ENTER". This was on an empty HP-28c. It probably would have worked on an HP-28s, with its much larger RAM. But the HP-28c only has 2 KB of RAM, and only about 1700 bytes available when empty. It has to remember the whole function while compiling it into its internal form. This is effectively a show stopper for large functions. This version has two fairly large functions. Even here, when there's a mistake, you have to delete all the little functions to be able to edit either of the larger functions. So check your work. There's 498 bytes free on the HP28c after the run. This is one of the largest programs that can run on the HP-28c.

It would be nice to generate the permutations of ABED where A, B, and D can be 1, 2, 3, 4, 6, 7, 8, 9 and E can be 2, 4, 6, 8. But can it be done in a compact way? This program effectively counts digits, and skips inner loops when it comes to a digit that doesn't work. That leaves 8 * 8 * 4 * 8 = 2048 loops. This is well within the time constraints of the HP-28c. This program produces the correct answer 396 * 45 = 17820 in 270 seconds (four and a half minutes) and 800 seconds to search the entire problem space.

When i broke up the original function, i discovered that i needed global variables to get A, B, E and D into the inner loop. I considered passing them to the inner loop function as local variables, but then the helper functions couldn't access them. It's unfortunate in this case that the FOR loops create local variables, and helper functions can't access them. That's just the way it is.

This first function is the one you run when you've got everything entered. It doesn't take any arguments.

Check out the "2 8 FOR E" bit. The matching "2 STEP" makes it go from 2 to 4 to 6 to 8. In BASIC, the loop would be "FOR E=2 TO 8 STEP 2". The RPL version puts the STEP bit in a place that's not that easy to locate.

```«
1 9 FOR A
A 'GA' STO
IF A 5 ≠ THEN
1 9 FOR B
B 'GB' STO
IF B CA THEN
2 8 FOR E
E 'GE' STO
IF E CB THEN
1 9 FOR D
D 'GD' STO
IF D CE THEN
A 10 * B + 10 * E +
D 10 * 5 +
* 10 / LP
END
NEXT
END
2 STEP
END
NEXT
END
NEXT
» 'P9' STO
```

Then there's LP, the "loop part". It has one parameter, X. X is what you get when you multiply ABE * D5. X is then disassembled into STUV, with the 0 discarded. The G function does this disassembly. The argument is assigned to X using the right arrow (shift U on the HP-28c). This creates the local variable X.

```«
→ X «
IF X 999 > THEN
X G 'V' STO
G 'U' STO
G 'T' STO
G 'S' STO
IF S CD THEN
IF T CS THEN
IF U CT THEN
IF V CU THEN
GA 10 * GB + 10 * GE + -> STR
"*" + GD 10 * 5 + ->STR +
"=" + X 10 * ->STR + "0" +
DUP 0 DISP
END
END
END
END
END
»
» 'LP' STO
```

G gets the next least significant digit from a number. It's really divmod. It takes a number, divides it by ten, and gets both the quotient and the remainder. It returns both on the stack. This little ditty takes 49 bytes.

```«
DUP 10 / FLOOR SWAP 10 MOD
» 'G' STO
```

Next come the helper functions that check to see if the given digit matches other digits. They return false (0) if any matches are found. Otherwise true. We could have a single function that checks all digits against all other digits. But it's better to check a single digit against what has been computed so far. That way, fewer total checks need to be done. The way this program achieves this is to cascade checks. The first function, CA, checks if the given digit matches A, the digit 0, or the digit 5. CB checks for a match of B, but then calls CA to check those. And so on, through CU, which checks against U, but then calls CT, which calls CS, and so on, calling all the functions to check all the digits. Checking E for 0 and 5 isn't needed, but it's done anyway, since that's done in CA. It's more or less harmless.

```«
DUP GA ≠ SWAP DUP 5 ≠ SWAP 0 ≠ AND AND
» 'CA' STO
```

These helper functions could have used IF/THEN, returning when false. This might be faster, but the code is bigger. The example below is not only much bigger (100 bytes vs 48 bytes), but it's slower on random input.

```«
IF DUP GA ≠ THEN
IF DUP 5 ≠ THEN
0 ≠
ELSE
DROP
0
END
ELSE
DROP
0
END
» 'CA' STO
```

CB checks the argument for ≠ B, then calls CA for A, 5, & 0.

```«
DUP GB ≠ SWAP CA AND
» 'CB' STO
```

CE checks the argument for ≠ E, then does CB. E stands for even, since this digit must be even.

```«
DUP GE ≠ SWAP CB AND
» 'CE' STO
```

CD checks the argument for D, then does CE.

```«
DUP GD ≠ SWAP CE AND
» 'CD' STO
```

CS checks the argument for S, then does CD.

```«
DUP S ≠ SWAP CD AND
» 'CS' STO
```

CT checks T, then does CS.

```«
DUP T ≠ SWAP CS AND
» 'CT' STO
```

CU checks U, then calls CT, which calls everything else.

```«
DUP U ≠ SWAP CT AND
» 'CU' STO
```

I liked this cascade style. It may not be the fastest (function calls are relatively expensive on the HP-28c, even though they're encouraged). It may not even be the smallest code. But it does work the way mathematicians think. That is, problems are attacked by solving small problems, then attacking a larger problems using the solutions to smaller problems that have already been solved. Languages like RPL and Forth have this in common with Lisp and variants like Scheme. One can design code this way for nearly any computer language. But i'm still a C and assembly language programmer at heart. I still like to shave off an instruction here and there which saves a few bytes (even if billions are available) and often a few nanoseconds or less. It should be noted that modern C compilers can do automatic function inlining. They can even note that a function can only possibly be called from one place, and not leave a callable copy. Using lots of little functions can still make such programs easier to maintain, without any penalties. In my experience, such techniques aren't limited to math and logic problems such as the above.

However, it's good to remember that superior algorithms often lead to the smallest, fastest code, and often by a large margin. For this test, one idea is to have an array of bits, one for each digit. Set the bits for 0 and 5. Then when each new digit is encountered, check if the new digit's bit is set, and if it isn't, set it. One routine clears the bits, and another does the check and set. I didn't code, test or time this idea.

That's all there is to it. Leave a comment if you manage to get it to work. Leave a comment if you attempted to get it to work and couldn't. Leave a comment if you improved it. The HP-28c is an ancient machine. But from time to time, i still see evidence that there are people who have one of these beasts. Maybe you have a younger model, like an HP-28s or an HP-48s. These programs should work with any of them.