Thursday, June 09, 2011

Forth for Enlightenment, part nine of ten, primes

Prime numbers are positive integers that can only be divided evenly by themselves and the number one. So there are no factors for prime numbers. Prime numbers are great for factoring, because if there are no prime factors, then there are no factors. The first few interesting prime numbers are 2, 3, 5, 7, 11, 13, 17, 19. If you're trying to factor a number, you only have to attempt to divide the number by prime numbers up to the square root of the number. An example should make it clear why. Let's say you're trying to tell if 25 is prime. You try 25 / 2, and get a 12 with a remainder of 1. You try 25 / 3, and get 8 with a remainder of 1. You try 25 / 5, and get 5 with a remainder of 0. So 25 is divisible by 5. But notice that you divided by 5 and got 5. The number 5 is the square root of 25. If you divide 25 by 6, you get 4 with a remainder of 1. But you already checked division by 4 (it's not a prime, but it's between 3 and 5, so we have effectively checked it). We don't want to check anything twice, so we only have to check up to the square root of the number. The short list of prime numbers given is good enough to check numbers up to 528. That's because the next prime after 19 is 23, which when squared id 529. So a small list of primes allows checking of large numbers. However, memory for storing prime numbers is in short supply on the HP-28C, so we'll check all numbers up to the square root, not just the primes.

The HP-28 RPL language has pretty much everything we need. There's a function to compute square root (√). There's a division function that returns the remainder called MOD. There are loops and conditionals. Yet my first version ran into a snag. It was convenient to use a FOR loop with a local variable. Having a local variable is handy because it doesn't clutter the stack, it doesn't need to be created before you start and purged when you're done with it. And local variables can be easily recalled. However, when a remainder of zero is found, indicating a factor has been found, the next thing to do is exit the loop and report the answer. It turns out that you can't exit a FOR loop in RPL. There's no "GOTO" in the language. The loops don't have any other "break" function, for example, as the C language has. There are two solutions. One is to let the loop finish. The other is to use a WHILE loop. In my first version, I continued with the FOR loop, but changed the problem to factoring a number. It got ugly. For example, if the number was 12, it divided by 2, but failed to check to see if it could continue dividing by 2. So it didn't always result in prime factors. A second version returns the focus to primes, using WHILE.


«
DUP √ FLOOR WHILE DUP 2 ≥ REPEAT
DUP2 MOD IF 0 == THEN
DROP 1
END
1 -
END
» 'ISP' STO

This function takes a number on the stack. It returns two numbers. The original number is left on the stack. But also a 0 is left if there is a factor, or a 1 is left on if the number is prime. Let's walk through this a bit to see how it does it. The first thing we want to do is get the square root of the number. But we'll need the original number later, so the first thing is to push a duplicate onto the stack. The square root function consumes that duplicate and pushes the square root of that number. The FLOOR function turns that into an integer, discarding any decimal fraction. So if the original number is seven, the square root is 2.64575..., FLOOR turns that into just two. The WHILE loop is started. The bit between WHILE and REPEAT is the loop condition. As long as the loop condition is true, the loop will execute. We'll need this square root, so DUP is used to make a stack copy. Then 2 ≥ compares it to 2 and the loop runs as long as the number is still at least 2. At this point we have the original number and the current number to check on the stack. We'll need both later, so DUP2 is called push a copy of both on the stack. MOD gets the remainder after division. I doubt that the IF does anything, but 0 == compares the remainder to zero. THEN executes the block to the matching END if the answer is true. If the remainder was zero, what we want to do is exit the loop, returning something that means the original value isn't prime. We have two items on the stack at this point. That's the original number and the current number to divide by. We DROP this number to divide by and push the constant 1 on the stack. After the IF THEN END, we have the current number to divide by on the top of the stack. We subtract one from it. If we had just pushed one on the stack, it becomes zero. If we just divided by two, which is the last number we check, it becomes one. In either of these cases, the loop stops. But if there are more numbers to check, they get checked in subsequent loops.

There are better ways to check for prime numbers. For example, a simple optimization is to first check if the number is divisible by 2. If it isn't even, then only the odd numbers up to the square root need to be checked. For large numbers, this could double the speed. There are other ideas as well. Some require lots of memory. Some are quite complicated. So we won't consider them now.

The main point of this exercise was to show how searching can be accomplished on this machine. Searching requires breaking from a loop that's in progress. A design decision in RPL was to not provide an explicit loop break mechanism. In fact, there is no GO-TO like function in the language. While anything you could want to implement can be implemented in this language, it isn't necessarily very easy. The loop structures available that work for this are WHILE-REPEAT-END and DO-UNTIL-END. My first thought was that these loops are equivalent and have their test at the start of the loop. But it turns out that the test can be placed anywhere in the loop. That is, any arbitrary commands can be placed between the WHILE and REPEAT clauses, and nothing need be placed between REPEAT and END. It turns out that one can not use WHILE-REPEAT-REPEAT-END to get the effect of two exits in one loop.

This bit about having to change the condition of the loop to break out of it is part of a wider issue. In 1968, Edsger W. Dijkstra published an often quoted paper entitled Go To Statement Considered Harmful. The paper starts with this line: For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce. Note that he has a value judgment not on the programs written, but on the programmers who wrote them. Programs in languages like FORTran (at the time) absolutely required the use of go-to statements. And in such programs, it was easy to end up with spaghetti code that was very difficult to follow. And that is the gist of the paper. But the opening line suggests that the fewer go-to's in a program, the better the programmer. It had been proven that go-to statements were never necessary in languages that support if-then-else and loops or recursion. The logical conclusion is that programmers who never use go-to statements are as good as one can get. This conclusion isn't correct. Once the right languages appeared, the goal of zero go-to's became possible, but a new issue reared it's ugly head. The issue is breaking from a loop. There are two easy ways to do this in a language like C. First, one can jump out of the loop with a go-to. Second, one can create a state variable and set it so that when the loop test is executed, the loop exits. And what happened in the go-to's are evil era was that all such loops were exited using a state variable. There are at least two problems with this approach. First, extra instructions to set up a test must be executed. Often there are extra instructions in the test itself that are executed each loop. Second, compilers of the day had no idea how to optimize this sort of code. Third, state variables can be just as difficult to follow as go-to's are. For example, if there are four state variables, it's just as difficult to understand as four go-to's. Except that if all four go-to's are clearly loop exits, then they're very clear after all. The Internet didn't exist back in the '70s, so most people never read the original paper. They just latched onto that first statement, and used it as Gospel. These days, we have less of an excuse. It's an easy read. It's quite clear that the go-to's are always bad concept is an over-generalization. After all, Dijkstra makes an exception for machine language. Why make an exception there? Well, the flow control in machine language generally is all built on some form of go-to. There's no choice. And besides assembly language programmers are generally better than those who can't code in assembler. What i mean by this is that assembly language is not generally the first computer language that programmers are taught. By the time one is writing assembly language, they have considerably more experience. It turns out that there are other exceptions to go-to's being poor practice.

In any case, the designers of RPL decided not to include a go-to. And, the decision made it sometimes more difficult for the programmer to achieve certain goals. But nothing was made impossible. And, there may have been a performance reason for not including a go-to. Some of the Scheme language environments allow go-to, but performance for all structures suffers all the time because of it. That means that performance for the Scheme implementation suffers. It's a big deal.

There was one other observation of note during this project. I ran across a list of software errors for the HP-28 (they're at the bottom). I tried some of these on my unit, and they were all problems. So one issue with having a system strictly in hardware is that it can't be fixed. The software in something as complicated as the HP-28 series is way too large to have any hope of being bug-free. The bright side is that the language can't change out from under your program. I've been bitten by Light Year conversions getting inaccurate results.

Lots of applications written for the HP-28 series are available through this site.

Conclusion:The HP-28 is a highly capable machine. The RPL language is concise, making it a good match for limited machines. However, it can be difficult to estimate how much memory a structure or chunk of code might consume without actually trying it. The stack manipulation slows writing of functions, and can make reading functions difficult. Since tail recursion optimization is not supported, the system is not a particularly good place to experiment with recursion.

As always, any questions, leave a comment. I'll get it and respond, even if this post is ancient history by the time you read it.

2 comments:

Stephen said...

It turns out that you can exit a FOR loop on the HP-28. Set the loop variable to the last value. For example, 1 5 FOR IF x 3 == THEN 3 END 5 'x' STO NEXT does not leave 3 on the stack. The reference manual doesn't mention it. Exiting a loop early is good for searches. I wish there was a 'RETURN' or 'EXIT' command that let you get out of a function early. There's no abusable GOTO.

Unknown said...

Everything You Need to Know About Printable Practice GED Test
printable ged practice test free