Thursday, May 26, 2011

Forth for Enlightenment, part five of ten, Backwords

How about a game that fits on this tiny machine that's fun to play? With 2 KB of RAM, this is tough. Fortunately, i have a copy of 101 BASIC Computer Games, which was written mostly using a PDP-11/70 (i got to log into the actual system, briefly, way back when). Anyway, there's a fairly simple game called REVERSE. I call it Backwords. Yes, it's misspelled. I've used the game for a couple purposes over the years. One is as an early program to write in a variety of languages. So, I have versions in C, Perl and OPL.

The game is pretty simple to play. In the original version, the digits from one through nine are used, but as ordered symbols. They start out in non-repeated random order. At each move, the user can reverse the first few digits. That is, the first two, or the first three, or, ... and on up to the first nine digits. The goal is to sort the digits from 1 to 9. Here's an example.

342156789 - Reverse 2 to get
432156789 - Reverse 4 to get
123456789 - you win.

A number of language capabilities have already been used in this series. For whatever reason, i've decided to use the letters 'a' through 'i' for this version. That lets me consider string manipulation.

The first obviously hard problem to solve is to take a string and reverse the first n symbols in it. Imagine a function named REV that takes two arguments, a string, and a number, and returns a string. First, my solution.

« DUP2
ACNT 1 + 20 SUB +

By the way. I just got my 25 year old HP28 printer working. New batteries were installed. The darkness slider was pushed all the way to full dark. About a meter (three feet) of paper was unspooled. And, it started working. So the main issue is that 25 year old thermal printer paper is pretty marginal, and the most external bits of it didn't work at all. When a function is printed using the printer, it prints the name of the function first, rather than the way i've done it above. I'm giving you the command to save the function.

But back to this function. You feed it a string, like "abcdefghi", and a number, like 3, and it spits out "cbadefghi". A little study of the function reveals that 'ACNT' is a variable used to remember something. It is incremented in the 2nd loop. The value is used with SUB, which is a built in string function to get part of a string. This variable is deleted before the function exits. Beyond that, we really need some stack diagrams to see what's going on. Here, the function is called with a string, and is directed to reverse the first 3 symbols. The stack trace is transposed so that time goes down, and the top of the stack is to the right. That's because web pages are potentially infinitely long, but in this forum, narrow. Stacks are often quite shallow. I'd only seen stack traces horizontally, but i find that i like them vertically now. Note that the Start at the begining is just to show what's on the stack before this segment runs. The START at the end is the loop START function.


The DUP2 saves both arguments for later. The DUP is used to make a stack copy of the numeric argument that gets consumed when 'STO' is used to put it into 'ACNT'. Then, the loop needs a start value (1) and an end value, the number to reverse. So in this case, the loop will execute 3 times. The 'START' loop command does not iterate a variable that you can access. So one of the features of the language is that there is often stack setup several commands before a function is called. While you read this setup, it's not at all clear what it's for. And that's why my listing is in little chunks. And it's probably easiest to read these backwards. Which is to say, start with the function, and try to figure out what the arguments are.

So, let's consider the loop. Here, the original arguments are not modified in the loop. So, they're not considered in the stack trace.

Start "abcdefghi"    

This is just one trip through the loop. In the first loop, it starts with the full string on the stack. It gets copied in the DUP. That's probably because it will get consumed shortly. Two constants, 1, and 1 are put on the stack. Then, SUB is called, which consumes both constants and the copy of the string. SUB takes the string and a position and a length from the stack and returns a string starting at the position and as long as the length. So, in this case, it returns the first character. SWAP pushes that first character one deeper on the stack and exposes the original copy of the string. The constants 2 and 20 are put on the stack, and SUB is called again. A substring is returned, starting with the 2nd character. But the returned string isn't 20 characters long because the original string isn't that long. So, clearly, this SUB returns the string from the 2nd character to the end of the string. And that's it for one pass. Each subsequent run through the loop effectively pushes one more character of the string onto the stack. The loop runs for as many times as we want to reverse.

The next code segment pops each of these characters off the stack and builds a new string. Since they were pushed onto the stack in order "a", "b", "c", they are popped off of the stack in the order "c", "b", "a", which is the reversing that we want.


The DROP discards any remaining string on the stack from the first loop. This exposes the "a", "b", "c" list on the stack. A zero length string is then pushed. This is the start of the answer string being built. A loop from 1 to ACNT (the number to reverse) is started. In each loop, the string being built and the next single character are SWAPed on the stack. Then "+" concatenates the string being built with the single character. All the single characters are consumed by the end of the last run through the loop, leaving the built reversed string. If this isn't clear from the code, build a stack diagram to see how it works. After a while, you end up recognizing patterns like this. Or, you get to be able to visualize the stack processing in your head. Or, you get good enough at the logic to be able to predict what happens. This last is cool, because you've just demonstrated that the Halting Problem can be solved after all, even though Alan Turing proved it can't be. There may be hope for computer program correctness after all.

The last bit does some stack cleanup, gets the rest of the string from the original, and pastes the reversed part and the rest of the string together. There's no loop here, but you may have forgotten some of the stack from the start of the function. So, we'll do a stack trace.

ACNT 1 + 20 SUB +


The SWAP exposes the count to reverse, which is then DROPped. The next SWAP exposes the original string. Then ACNT is recalled and one is added to it. The constant 20 is pushed so that the SUB gets the string from position 4 to the end. The "+" concatenates the two strings, yielding the answer. The 'ACNT' PURGE simply deletes the temporary variable.

At this point, the Forth programmer is thinking, "Why was 3, which is the number of characters to reverse, remembered on the stack for all this time, just to DROP it? And indeed, code like "SWAP 1 + SWAP ROT ROT 20 SUB +" would work as a replacement for the above sequence. It's the same length and doesn't need the 'ACNT' variable. Since it is the same length, it is likely to run in about the same amount of time. Then, one could also rewrite the rest of the function to eliminate the 'ACNT' variable. And that rewrite might be slightly shorter or longer. It's highly likely that Forth programmers do this sort of thing all the time. But consider that the 'ACNT' variable is labeled, whereas none of the stack positions are labeled. It's kind of like Star Trek, where on the bridge, there are huge panels full of unlabeled buttons. Labeled variables are easy to read.

It must be admitted that it was my intent to rewrite the function to eliminate the variable. It was easiest to get the gist of the function working by using the variable. I thought of it as a crutch to get the function written. But having gotten it written, it seems to me that it's better the way it is. A down side of using a variable is that some other program might accidently use the same name. Since programs all use the same name space, this function could possibly delete a variable that is in use by some other program. Yet, i have total control over the calling program, so it's simply not going to be an issue. And, a naming convention, such has using the name of the function as a variable name suffix, would also make this a non-issue. On the HP-28C, there's only 2 KB of RAM. That's not much for code and data. On the HP-28S, there's 32 KB of RAM. But even there, only the calling program parts of this function need be considered. Only one program can run at a time.

This function is basically enough to play the game. Key in "abcdefghi", the enter button, and a number like 3. It runs pretty fast, and returns "cbadefghi". It's reasonably fast because the two loops combined have ten functions. If you reverse 9, that's 90 functions. There are another 10 odd functions, so the maximum is about 100 functions. That takes place in a hair over a second. One should see how it performs on unexpected input. Key in other numbers and make sure they return the right result. Does it work right if you try to reverse 1 character? How about something longer than the string, like 20? How about zero? How about -2? Frankly, i was satisfied with it's behavior with unexpected input. This can be controlled by calling code anyway.

One of the design notes that was skipped is interesting. The stack is used to effectively reverse characters in the string. But in a language like C or Perl, it's more likely that pairs of characters would be swapped, using a temporary variable. And that's because in those languages, one can subscript into a string and write whatever you want there. The HP-28 RPL language does not have a function to put a character into the middle of a string. So this simply isn't possible. In fact, i considered not using strings at all. I considered using an array for everything. Arrays can't hold characters or strings. So i'd have to either convert the result array to a string or come up with some other way to display it. It's very likely that using arrays on this calculator would require less code that executes faster on this machine, despite conversion.

This function isn't the whole game. It doesn't know if you've "won", and it doesn't create a randomized string for a starting problem. What is amazing is how much code is needed to turn this almost-game into a game. And that's in the next post.

No comments: