Friday, February 01, 2013

Forth for Enlightenment, part eleven of ten, Number Problem Nine

Forth for Enlightenment, part eleven of ten, Number Problem Nine

I came across this logic problem. It uses numbers. I was able to solve it without the aid of any kind of computer. And there are computers that are way faster, and easier to use than my 1986 vintage HP-28c programmable calculator. But it looked as if a program to solve this just might fit on the beast, if barely. Also, it looked as if the run time to reach a solution might be less than hours. This machine isn't speedy. The processor has a cycle time of 640 KHz. Modern desktop processors are measured in GHz. There are 1000 KHz per MHz, and a 1000 GHz per MHz. Call it a challenge.

The following multiplication example uses all the digits from 0 to 9 once and once only (not counting the intermediate steps). Finish the problem. One digit has been filled in to get you started.

abc * d5 = efghi

Before we get started on the RPL program, let's do some minor analysis of the problem. The least significant digit produced by the 5 and c must either be 0 or 5. But if it's 5, then it's a duplicated digit. So digit "i" must be 0. That means that 'c' must be even. In fact, this program uses 'E' for Even instead of 'c'. The program doesn't have to guess either the 5 or the 0. Instead of a ten digit problem, it's an 8 digit problem.

ABE * D5 = STUV0

The program could cycle from 0 to 99999999. However, an empty loop that long takes about 8.8 days to execute. Since digits are non-repeating, and 0 isn't needed, the loop can start ABEDSTUV from 12346789 and go through 98764321. That brings it down to 7.6 days. The permutations of 8 digits that can only hold 8 distinct values is 8! = 40320. That's a loop that takes five minutes. But even there, it really only has to generate the permutations of the digits A, B, E and D from 1 through 9 excluding 5, multiply ABE * D5 and check to see if STUV also avoids 0, 5 and any of the digits A, B, E and D. And that's how this program does it.

Before you enter the program, you should know that i made numerous mistakes entering it into the machine. My first attempt had one huge function. When i finished entering it, i got "NO ROOM TO ENTER". This was on an empty HP-28c. It probably would have worked on an HP-28s, with its much larger RAM. But the HP-28c only has 2 KB of RAM, and only about 1700 bytes available when empty. It has to remember the whole function while compiling it into its internal form. This is effectively a show stopper for large functions. This version has two fairly large functions. Even here, when there's a mistake, you have to delete all the little functions to be able to edit either of the larger functions. So check your work. There's 498 bytes free on the HP28c after the run. This is one of the largest programs that can run on the HP-28c.

It would be nice to generate the permutations of ABED where A, B, and D can be 1, 2, 3, 4, 6, 7, 8, 9 and E can be 2, 4, 6, 8. But can it be done in a compact way? This program effectively counts digits, and skips inner loops when it comes to a digit that doesn't work. That leaves 8 * 8 * 4 * 8 = 2048 loops. This is well within the time constraints of the HP-28c. This program produces the correct answer 396 * 45 = 17820 in 270 seconds (four and a half minutes) and 800 seconds to search the entire problem space.

When i broke up the original function, i discovered that i needed global variables to get A, B, E and D into the inner loop. I considered passing them to the inner loop function as local variables, but then the helper functions couldn't access them. It's unfortunate in this case that the FOR loops create local variables, and helper functions can't access them. That's just the way it is.

This first function is the one you run when you've got everything entered. It doesn't take any arguments.

Check out the "2 8 FOR E" bit. The matching "2 STEP" makes it go from 2 to 4 to 6 to 8. In BASIC, the loop would be "FOR E=2 TO 8 STEP 2". The RPL version puts the STEP bit in a place that's not that easy to locate.

 1 9 FOR A
  A 'GA' STO
  IF A 5 ≠ THEN
   1 9 FOR B
    B 'GB' STO
     2 8 FOR E
      E 'GE' STO
      IF E CB THEN
       1 9 FOR D
        D 'GD' STO
        IF D CE THEN
         A 10 * B + 10 * E +
         D 10 * 5 +
         * 10 / LP
     2 STEP
» 'P9' STO

Then there's LP, the "loop part". It has one parameter, X. X is what you get when you multiply ABE * D5. X is then disassembled into STUV, with the 0 discarded. The G function does this disassembly. The argument is assigned to X using the right arrow (shift U on the HP-28c). This creates the local variable X.

 → X «
  IF X 999 > THEN
   X G 'V' STO
   G 'U' STO
   G 'T' STO
   G 'S' STO
      IF V CU THEN
       GA 10 * GB + 10 * GE + -> STR
       "*" + GD 10 * 5 + ->STR +
       "=" + X 10 * ->STR + "0" +
       DUP 0 DISP
» 'LP' STO

G gets the next least significant digit from a number. It's really divmod. It takes a number, divides it by ten, and gets both the quotient and the remainder. It returns both on the stack. This little ditty takes 49 bytes.

» 'G' STO

Next come the helper functions that check to see if the given digit matches other digits. They return false (0) if any matches are found. Otherwise true. We could have a single function that checks all digits against all other digits. But it's better to check a single digit against what has been computed so far. That way, fewer total checks need to be done. The way this program achieves this is to cascade checks. The first function, CA, checks if the given digit matches A, the digit 0, or the digit 5. CB checks for a match of B, but then calls CA to check those. And so on, through CU, which checks against U, but then calls CT, which calls CS, and so on, calling all the functions to check all the digits. Checking E for 0 and 5 isn't needed, but it's done anyway, since that's done in CA. It's more or less harmless.

» 'CA' STO

These helper functions could have used IF/THEN, returning when false. This might be faster, but the code is bigger. The example below is not only much bigger (100 bytes vs 48 bytes), but it's slower on random input.

   0 ≠
» 'CA' STO

CB checks the argument for ≠ B, then calls CA for A, 5, & 0.

» 'CB' STO

CE checks the argument for ≠ E, then does CB. E stands for even, since this digit must be even.

» 'CE' STO

CD checks the argument for D, then does CE.

» 'CD' STO

CS checks the argument for S, then does CD.

» 'CS' STO

CT checks T, then does CS.

» 'CT' STO

CU checks U, then calls CT, which calls everything else.

» 'CU' STO

I liked this cascade style. It may not be the fastest (function calls are relatively expensive on the HP-28c, even though they're encouraged). It may not even be the smallest code. But it does work the way mathematicians think. That is, problems are attacked by solving small problems, then attacking a larger problems using the solutions to smaller problems that have already been solved. Languages like RPL and Forth have this in common with Lisp and variants like Scheme. One can design code this way for nearly any computer language. But i'm still a C and assembly language programmer at heart. I still like to shave off an instruction here and there which saves a few bytes (even if billions are available) and often a few nanoseconds or less. It should be noted that modern C compilers can do automatic function inlining. They can even note that a function can only possibly be called from one place, and not leave a callable copy. Using lots of little functions can still make such programs easier to maintain, without any penalties. In my experience, such techniques aren't limited to math and logic problems such as the above.

However, it's good to remember that superior algorithms often lead to the smallest, fastest code, and often by a large margin. For this test, one idea is to have an array of bits, one for each digit. Set the bits for 0 and 5. Then when each new digit is encountered, check if the new digit's bit is set, and if it isn't, set it. One routine clears the bits, and another does the check and set. I didn't code, test or time this idea.

That's all there is to it. Leave a comment if you manage to get it to work. Leave a comment if you attempted to get it to work and couldn't. Leave a comment if you improved it. The HP-28c is an ancient machine. But from time to time, i still see evidence that there are people who have one of these beasts. Maybe you have a younger model, like an HP-28s or an HP-48s. These programs should work with any of them.


Douglas Butler said...

As a hardware guy this problem sends my mind whirling in a different direction. Since you have trivially solved for 0 and 5 that leaves only 8 choices. So the choices can be stored as 3 bits instead of the usual 4 for a potential energy savings of up to 25%!. But it will take a carefully crafted ALU to take advantage of this savings. Of course using a standard 4 bit CPU or bit-slices is out. Even with a 3 bit data path Harvard architecture we will have to modify the usual multiplication microcode to deal with the missing digits.

And so the snow gets shoveled.


Stephen said...

The Saturn CPU is an odd duck. It's got a 4 bit data path, 20 bit addresses, and some 64 bit registers. I'd be that an assembly language implmentation solution could make use of all sorts of tricks. I've not done that, at least not yet. There is a system command to execute code whereever you want, and the Saturn processor has documentation.