Friday, May 20, 2011

Forth for Enlightenment, part one of ten, Factorial

My Scheme series introduced factorials in part one (the 2nd post). The factorial function is strictly numerical. The factorial of 5 is 5! = 5 * 4 * 3 * 2 * 1 = 120. The factorial of any number is the product of the integers from 1 up to that number. The numbers are multiplied together. A series using the HP-28 calculator ought to have this function early on. After all, the factorial function works with numbers. So do calculators. But doesn't the HP-28 have a built-in factorial function? Yes it does. And it's fast. And it can deal with fractions. I mean, the factorial of 5 is 5! = 5 * 4 * 3 * 2 * 1 = 120. But what is the factorial of 5.5? Well, there's a gamma function, and it yields 287.9 (with more decimal digits). The gamma function closely tracks the factorial function. For very large factorials, the gamma function can get you an answer that is good enough without doing all those multiplies. The HP-28 can represent numbers up to 10^500. So the FACT function is limited to input up to 253.

If you write your own factorial function, you'll find that it's not as fast as the internal function. The internal function is pretty much instant. It's not clear if they coded it in assembler, if ROM is faster than RAM (as it was with the Texas Instruments TI-59 calculator), or if they used some tricks. It might be a combination of these. One of the tricks it might do is store the answers to all factorials up to 253. Or, they could use less space and only store every tenth factorial. So, 10! is 3,628,800. If you need 11!, you start with 10! and multiply by 11. Then, factorials would be really fast.

I wasn't so much interested in speed as programming. The recursive version is first. It doesn't support the gamma function. Just the multiplication of integers is considered.


« DUP IF 1 ≠ THEN ; stop at 1. If not...
DUP ROT * ; multiply count into product
SWAP ; switch prod and count
1 - FA ; cnt = cnt - 1, call self
END
» 'FA' STO

Typography first. The 'FA' STO at the end isn't really part of the function. It's how you enter the function. 'FA' is the function's name.

The function expects two arguments. The number 1, and the factorial to be computed. So, this can be run with 1 5 FA. But it also returns two items on the stack. And the answer, 120, is buried. It returns the answer and the number 1 on the stack. So, get the answer by running DROP. This isn't really clear to an end user, so there should be a wrapper function, which i call 'FAC'.


« 1 SWAP FA DROP
» 'FAC' STO

This FAC fucntion takes one argument, and returns the answer. So 5 FAC returns 120. FA, and therefore FAC, can compute factorials for numbers up to 253 properly. That's from the limitation of numbers, not from running out of memory for stack data.

But let's get back to FA again. In it, the HP-28 function ROT is used which rotates entries on the stack. It's pretty clever. But it's not clear what it should be good for. It works for this program. ROT comes from ROTate. Let's examine how it works. If a, b, and c are on the stack before ROT, then b, c and a are on the stack afterwords. Here it is in table form.





LevelBefore ROTAfter ROT
3ab
2bc
1ca

Now, FA takes two arguments on the stack, which are 1 and the factorial number to compute. It turns out that the 1 is needed as a running product. The count is multiplied by the current product on each loop down. The current count is at the top of the stack. In the first line, the count is copied, so that there are two of them. The IF consumes one count in the compare to 1. In the line with the count, we want to multiply the count into the product, but we need to remember the count also. So the count is duplicated, and then ROT does it's magic. Finally, we have the multiply. The next line does a SWAP to restore the product and count to where they started. Here's a stack diagram for operations starting in line 2. The stack shows what's on the stack after each operator.





LevelStartDUPROT*SWAP1-
3 prodcount  prod 
2prodcountcountcountprodcountprod
1countcountprodprod * countcount1count - 1

Note that in the column with the multilply, prod * count is shown. After that, it's the new product, so just prod is shown. The stack diagram should get you through the operation of this function. Forth programmers use these often. But i wouldn't be surprised if after awhile, idioms involving ROT become second nature. I timed this function with the highest argument, 253. It took 20 seconds. The built in function comes back with the answer almost right away, even for 253.

This is an acedemic excersise, so a version that uses a loop instead of recursion was written next. I used a new name, FACTO. It takes the number to compute, and returns the answer. It's also shorter and simpler.


« 1 SWAP 1 SWAP
FOR x
x * ; multiply count into product
NEXT
» 'FACTO' STO

The first line with the SWAPs might be confusing. The FOR operator takes two arguments, the start number, and the ending number. The count is already on the stack, but needs to be in level 1. A 1 is needed on level 2 to serve as the start number. A 1 is needed on level 3 to serve as the running product. Here's a stack diagram.





Levelstart1SWAP1SWAPFOR
3   11 
2 count1count1 
1count1count1count1

The FOR loop's first iteration starts with 1 on the stack, which is the running product. The x in the FOR line is a local variable. It matches the x in the third line. This variable counts from 1 to count, stepping by one each loop. So x * multiplies the current running product by the current count number each loop. The NEXT function marks the end of the FOR loop. The local variable conveniently and automatically is removed when the loop is finished with all iterations. When each loop is finished, the current running product is the only thing on the stack. So when all loops are complete, the complete product is returned.

The start with the SWAPS can be made shorter, if perhaps more complicated. Consider this version:


« 1 1 ROT
FOR x
x * ; multiply count into product
NEXT
» 'FACTO' STO






Levelstart11ROTFOR
3  count1 
2 count11 
1count11count1

Clearly, it gets the stack to the same place and with 1 fewer command. It might be marginally faster, but it's only one command that is skipped, and only once. It is shorter, so it should take just a tiny bit less memory. While the HP-28C is memory starved, in reality one would use the built-in FACT function, which would take zero memory. It does show that the ROT function is often valuable.

This version of the factorial function is faster than the recursive version. It took 5.8 seconds. That's more than three times faster. It isn't anywhere near as fast as the built-in function, however. It also takes less memory, since it doesn't use more than three stack levels. It doesn't need a helper function.

Since recursion can always be converted to loops, one wonders why recursion is ever needed. And indeed, it isn't. But sometimes recursion is how the problem is presented. If that's the case, then often recursion is a good approach to take. But there are also algorithms that are easier with recursion. The Quick Sort algorithm comes to mind.

No comments: