Wednesday, June 29, 2011

euler

I bumped into a site that has mostly mathematical computer problems to solve. The site says that you can use any tools you want to come up with solutions. I use whatever i have at hand. That has included C, Perl, the shell, and my 25 year old HP-28C calculator. The problems are genearlly designed to require some computer programming. But some people have managed to solve some of the problems by hand(!). I have not, as yet, looked for hints elsewhere on the Internet, even though this is encouraged. The site suggests that computer programs can be written which take less than a minute to run.

At the time of this post, i've solved 36 of the over 300 puzzles. I don't attempt to limit computer time. If i get a solution, it's a solution. I expect that once i have a correct solution, it's code that will never run again, so there's not much point in optimizing the code. However, i like optimizing code...

Sometimes i make mistakes. But the system allows you to guess again if you get a wrong answer. One of the solutions i was writing had an estimated run time of a few hours. I ran it over night. But in the morning, it was still running. It turned out that there was an error, and it would never have finished. Opps. When i fixed it, i also noticed a way to speed it up, and the final version took about a minute. However, it took me a half hour to get the speed up working. I'm really just trying to get the total time spent down, except that i value my time over my computer's time.

One of the puzzles asks how many months start on Sunday in the 20th century. I could have done some calendar arithmetic (i've done it before). I could have narrowed it down with some math. But Unix (and Linux) have a calendar function, and it produces text, so how hard could it be in a shell script?

#!/bin/sh
for y in `cnt 1901 2000`; do
for m in `cnt 1 12`; do
cal $m $y | grep "^ 1"
done
done

I ran this script with the output into "wc -l". The "wc" program is the Unix word count program. The "-l" option makes it count lines. So here's how it works. The "for y" loop counts years from 1901 to 2000, inclusive. That's the 20th century. The 20th century is almost the 1900's. That's because the 1st century is the first hundred years, from year 1 through year 100, inclusive. The 2nd century is from 101 to 200. Anyway, the "for m" loop counts the 12 months. "cal" is the Unix calendar program. It can take two arguments, the month ($m) and year ($y). The "|" (vertical bar) symbol tells the shell to direct the output of "cal" into the next program. The "grep" bit is the Unix search program. It stands for "General Regular Expression and Print". The regular expression is the "^ 1" bit. This expressions tells grep to only pass lines that start with a space and the number 1. It turns out that the calendar format for a month has dates right justified. If a date is "10", then there's no space before it. The default calendars have Sunday at the far left. So this little script spits out a line like this:

1 2 3 4 5 6 7

for each month that starts with a Sunday. I run the output of the script into the word count program, having it count lines. The number of lines is the number of months that start on Sunday.

There's just one more little thing to describe. In the line "for y in `cnt 1901 2000`; do" there's a bit of syntax and a program. The syntax is the back quotes - the first just before the word cnt. The stuff in between the back quotes is a program with it's options, in this case cnt. This program is expected to print out a list, and the variable y gets set to each value on each loop. But the program cnt is something i wrote in the 1980's. It knows how to count up or down, with skips, with leading zeroes or spaces, with small functions taking the initial integer for a start, and can output in roman numerals and other formats. There must be solutions like it for other people, but this is what i've used. I've not released it to the public, for various reasons. So, if you want to use my solution, you'll have to solve this bit of the puzzle yourself.

Anyway, this solution produced an answer really quickly. It took more time to write this post than to write and run this program.

The site has a forum for each problem. Early solvers often post hints on how to solve or optimize solutions. And, often, these hints help you solve later problems. For example, there are a bunch of problems involving prime numbers. Having a good prime number test can be really handy. I peruse these and see if there's anything of interest.

This whole site is about education. A million years ago, i looked at the book 101 BASIC Games. These were games, written in the computer language BASIC. Not all of them worked. There were lots of different variants of BASIC when it was written. The programs were submitted by lots of different people using most of those variants. I studied a bunch of them to see how they worked. But it wasn't really a curriculum. The interesting thing about Project Euler is that it's more like a curriculum. Each of the problems requires a new skill or skills. The result is inductive chain learning. This can be an efficient way to learn. And, it's free.

Wednesday, June 15, 2011

Stress

It's been suggested that college students should be graded on attendance.
http://www.wired.com/wiredscience/2011/06/do-you-get-better-grades-with-better-attendance/
First, a couple anecdotes, and then a better idea.

I had an engineering class called Stress, with a low pass rate. The prof did everything he could think of to get students to pass. Tons of office hour time, etc. 1st day, he passed out a syllabus noting the 3 open book exam dates and all homework. All homework counted the same as an exam. Of the 4 grades, the lowest was dropped. I started the 1st week's homework, but it was clear there was only time for half of it. I was careful to hit all types of problems. I passed it in, but, of course, 50% is not a passing grade. It became clear that the prof taught the material in the book. But i could read faster than he could talk, so i stopped going to class. I did OK on the first exam. One Friday, i came in for what i thought was the 2nd exam, but he passed out a quiz. At a glance, it covered next week's homework. I hadn't done that yet. I rechecked the syllabus, and it said that the 2nd exam was the following Monday - i hadn't missed it. I handed the quiz back. The prof told me that i'd likely fail the course. I looked him in the eye, but didn't tell him i'd do OK if i didn't miss any exams. The last exam was held at the start of the last week. It didn't cover the last week's homework. I did half of that anyway. And, i passed the course.

I would never have skipped half the homework if it weren't impossible to finish. I would never have skipped the classes if the pressure to optimize my time weren't so severe. Exams have high time pressure, and i already knew that my performance would be awful on them. It was just the only choice available. I liked the course material and the prof, but it still pisses me off that the course was set up that way. If i'd known in advance, i'd have dropped another course from my schedule, so i could devote twice the time to it. So the degree means what? These are the students who managed to get through by cheating, optimization, or by being unbelievably quick? Are these skills one needs in industry? Not as far as i can tell.

There was a brief break. Then Stress 2 was taught by the same prof. I figured he'd pass out a syllabus on the first day, so i didn't bring my books or calculator. He passed out an exam! I ran home, grabbed my stuff and ran back in 20 minutes. Panting, i asked for an exam. He found me a seat and an exam that didn't match my neighbors. At the 50 minute mark, i was done (!). I looked around and there was panic on 109 faces. I figured i must have done something wrong. All these kids are brilliant. I checked my answers. No errors. So at the 55 minute mark, i got up and handed it in.

Next day, the prof writes something like this on the board:










97:1
60-70:4
50-60:7
40-50:19
30-40:37
20-30:23
10-20:13
0-10:6

At first i wondered how i'd lost 3 points. But then i realized that i'd done the course right. One of the four questions was on that last week's homework, and i was the only one who'd done even half of it. In this course, there'd be 4 exams, homework with the lowest dropped. I did half the homework. Then it was announced that the lowest two grades would be dropped. But i took the 4th exam even though i'd already passed the course.

Years later i auditioned for a chorus. It involved sight singing and a bunch of music things i'd never done before. That morning, i ran 4 miles, showered, and took the train. The director gave me my starting tenor note and played the accompaniment. I dived in, but stumbled. But then i started getting where the piece was going, relaxed a bit and read ahead. At the start of the last line, i turned the page over (while singing), but it was blank. I looked up and finished the page, adding dynamics. The director was stunned. So was i. I'm simply not that good. But a 4 mile warm up does wonders. Phys Ed should be held for 20 minutes every morning for all students. They don't have to be awake for it.

Monday, June 13, 2011

STEM

One of the big pushes in education is to get students to do well in the STEM areas. That's Science, Technology, Engineering, and Math. From my perspective, SE would be enough. That is, if you've got Engineering, that's how you build or support technology. To do either Science or Engineering, you need math. Math turns out to be one of those tools which allows you to solve all sorts of problems. You really can't do without it. Not all students are good at it. In fact, nearly everyone has to struggle with it. Being good at math is, at least to some extent, a measure of how hard one has struggled with it compared to everyone else.

One of the reasons for students to go into science or engineering is that a technological society needs people to do these things to stay competitive with other countries. And if there is a need for a skill, then there are jobs. So, that's good for the student's future.

But just to mention a contrary concept, it turns out that when you go to bigger companies that want engineers, they're also looking for engineers with communications skills. So, it is not a good idea to ignore your natural language skills. That includes reading and writing, but also public speaking. These are good skills to have.

Thursday, June 09, 2011

Forth for Enlightenment, part nine of ten, primes

Prime numbers are positive integers that can only be divided evenly by themselves and the number one. So there are no factors for prime numbers. Prime numbers are great for factoring, because if there are no prime factors, then there are no factors. The first few interesting prime numbers are 2, 3, 5, 7, 11, 13, 17, 19. If you're trying to factor a number, you only have to attempt to divide the number by prime numbers up to the square root of the number. An example should make it clear why. Let's say you're trying to tell if 25 is prime. You try 25 / 2, and get a 12 with a remainder of 1. You try 25 / 3, and get 8 with a remainder of 1. You try 25 / 5, and get 5 with a remainder of 0. So 25 is divisible by 5. But notice that you divided by 5 and got 5. The number 5 is the square root of 25. If you divide 25 by 6, you get 4 with a remainder of 1. But you already checked division by 4 (it's not a prime, but it's between 3 and 5, so we have effectively checked it). We don't want to check anything twice, so we only have to check up to the square root of the number. The short list of prime numbers given is good enough to check numbers up to 528. That's because the next prime after 19 is 23, which when squared id 529. So a small list of primes allows checking of large numbers. However, memory for storing prime numbers is in short supply on the HP-28C, so we'll check all numbers up to the square root, not just the primes.

The HP-28 RPL language has pretty much everything we need. There's a function to compute square root (√). There's a division function that returns the remainder called MOD. There are loops and conditionals. Yet my first version ran into a snag. It was convenient to use a FOR loop with a local variable. Having a local variable is handy because it doesn't clutter the stack, it doesn't need to be created before you start and purged when you're done with it. And local variables can be easily recalled. However, when a remainder of zero is found, indicating a factor has been found, the next thing to do is exit the loop and report the answer. It turns out that you can't exit a FOR loop in RPL. There's no "GOTO" in the language. The loops don't have any other "break" function, for example, as the C language has. There are two solutions. One is to let the loop finish. The other is to use a WHILE loop. In my first version, I continued with the FOR loop, but changed the problem to factoring a number. It got ugly. For example, if the number was 12, it divided by 2, but failed to check to see if it could continue dividing by 2. So it didn't always result in prime factors. A second version returns the focus to primes, using WHILE.


«
DUP √ FLOOR WHILE DUP 2 ≥ REPEAT
DUP2 MOD IF 0 == THEN
DROP 1
END
1 -
END
» 'ISP' STO

This function takes a number on the stack. It returns two numbers. The original number is left on the stack. But also a 0 is left if there is a factor, or a 1 is left on if the number is prime. Let's walk through this a bit to see how it does it. The first thing we want to do is get the square root of the number. But we'll need the original number later, so the first thing is to push a duplicate onto the stack. The square root function consumes that duplicate and pushes the square root of that number. The FLOOR function turns that into an integer, discarding any decimal fraction. So if the original number is seven, the square root is 2.64575..., FLOOR turns that into just two. The WHILE loop is started. The bit between WHILE and REPEAT is the loop condition. As long as the loop condition is true, the loop will execute. We'll need this square root, so DUP is used to make a stack copy. Then 2 ≥ compares it to 2 and the loop runs as long as the number is still at least 2. At this point we have the original number and the current number to check on the stack. We'll need both later, so DUP2 is called push a copy of both on the stack. MOD gets the remainder after division. I doubt that the IF does anything, but 0 == compares the remainder to zero. THEN executes the block to the matching END if the answer is true. If the remainder was zero, what we want to do is exit the loop, returning something that means the original value isn't prime. We have two items on the stack at this point. That's the original number and the current number to divide by. We DROP this number to divide by and push the constant 1 on the stack. After the IF THEN END, we have the current number to divide by on the top of the stack. We subtract one from it. If we had just pushed one on the stack, it becomes zero. If we just divided by two, which is the last number we check, it becomes one. In either of these cases, the loop stops. But if there are more numbers to check, they get checked in subsequent loops.

There are better ways to check for prime numbers. For example, a simple optimization is to first check if the number is divisible by 2. If it isn't even, then only the odd numbers up to the square root need to be checked. For large numbers, this could double the speed. There are other ideas as well. Some require lots of memory. Some are quite complicated. So we won't consider them now.

The main point of this exercise was to show how searching can be accomplished on this machine. Searching requires breaking from a loop that's in progress. A design decision in RPL was to not provide an explicit loop break mechanism. In fact, there is no GO-TO like function in the language. While anything you could want to implement can be implemented in this language, it isn't necessarily very easy. The loop structures available that work for this are WHILE-REPEAT-END and DO-UNTIL-END. My first thought was that these loops are equivalent and have their test at the start of the loop. But it turns out that the test can be placed anywhere in the loop. That is, any arbitrary commands can be placed between the WHILE and REPEAT clauses, and nothing need be placed between REPEAT and END. It turns out that one can not use WHILE-REPEAT-REPEAT-END to get the effect of two exits in one loop.

This bit about having to change the condition of the loop to break out of it is part of a wider issue. In 1968, Edsger W. Dijkstra published an often quoted paper entitled Go To Statement Considered Harmful. The paper starts with this line: For a number of years I have been familiar with the observation that the quality of programmers is a decreasing function of the density of go to statements in the programs they produce. Note that he has a value judgment not on the programs written, but on the programmers who wrote them. Programs in languages like FORTran (at the time) absolutely required the use of go-to statements. And in such programs, it was easy to end up with spaghetti code that was very difficult to follow. And that is the gist of the paper. But the opening line suggests that the fewer go-to's in a program, the better the programmer. It had been proven that go-to statements were never necessary in languages that support if-then-else and loops or recursion. The logical conclusion is that programmers who never use go-to statements are as good as one can get. This conclusion isn't correct. Once the right languages appeared, the goal of zero go-to's became possible, but a new issue reared it's ugly head. The issue is breaking from a loop. There are two easy ways to do this in a language like C. First, one can jump out of the loop with a go-to. Second, one can create a state variable and set it so that when the loop test is executed, the loop exits. And what happened in the go-to's are evil era was that all such loops were exited using a state variable. There are at least two problems with this approach. First, extra instructions to set up a test must be executed. Often there are extra instructions in the test itself that are executed each loop. Second, compilers of the day had no idea how to optimize this sort of code. Third, state variables can be just as difficult to follow as go-to's are. For example, if there are four state variables, it's just as difficult to understand as four go-to's. Except that if all four go-to's are clearly loop exits, then they're very clear after all. The Internet didn't exist back in the '70s, so most people never read the original paper. They just latched onto that first statement, and used it as Gospel. These days, we have less of an excuse. It's an easy read. It's quite clear that the go-to's are always bad concept is an over-generalization. After all, Dijkstra makes an exception for machine language. Why make an exception there? Well, the flow control in machine language generally is all built on some form of go-to. There's no choice. And besides assembly language programmers are generally better than those who can't code in assembler. What i mean by this is that assembly language is not generally the first computer language that programmers are taught. By the time one is writing assembly language, they have considerably more experience. It turns out that there are other exceptions to go-to's being poor practice.

In any case, the designers of RPL decided not to include a go-to. And, the decision made it sometimes more difficult for the programmer to achieve certain goals. But nothing was made impossible. And, there may have been a performance reason for not including a go-to. Some of the Scheme language environments allow go-to, but performance for all structures suffers all the time because of it. That means that performance for the Scheme implementation suffers. It's a big deal.

There was one other observation of note during this project. I ran across a list of software errors for the HP-28 (they're at the bottom). I tried some of these on my unit, and they were all problems. So one issue with having a system strictly in hardware is that it can't be fixed. The software in something as complicated as the HP-28 series is way too large to have any hope of being bug-free. The bright side is that the language can't change out from under your program. I've been bitten by Light Year conversions getting inaccurate results.

Lots of applications written for the HP-28 series are available through this site.

Conclusion:The HP-28 is a highly capable machine. The RPL language is concise, making it a good match for limited machines. However, it can be difficult to estimate how much memory a structure or chunk of code might consume without actually trying it. The stack manipulation slows writing of functions, and can make reading functions difficult. Since tail recursion optimization is not supported, the system is not a particularly good place to experiment with recursion.

As always, any questions, leave a comment. I'll get it and respond, even if this post is ancient history by the time you read it.

Friday, June 03, 2011

Fear of Math

Fear of Math is real. If one adds 2 + 3 and gets 6, it's not close. It's wrong. All math requires careful approach, careful execution, and careful cross checking. For example, in 2 + 3, start with two fingers, and add three more. The answer is 6. But 2 is even and 3 is odd, and an even plus an odd is always odd, and 6 isn't odd. All this careful stuff requires a huge investment, and if there is a perception of likely failure, the return on investment calculation (which for some reason is easy for everyone as it is a right brain function though apparently with output directly to the amygdala) comes up negative. It's no wonder that no one wants to do it. Just for fun, anxiety shuts down short term memory. Proof positive that God isn't very smart - though one can sort of see the evolutionary advantages. The left brain style computations aren't as fast as the right brain reactions. So right brain reactions kept our ancestors from getting eaten more often.

My approach has been mostly to make the math work more reliable. For example, when my son was adding 2 + 3, he got 4, 1/3rd of the time, 5, 1/3rd of the time and 6, 1/3rd of the time. I had to watch him do this quite a bit before i figured out that he somehow was taught counting on his fingers in an unreliable manner. I introduced him to the Japanese abacus, and in twelve weeks his addition and subtraction was reliable. I then showed him that he could use his fingers like an abacus (for two digits), and his math scores went from behind the class to ahead. Also, every time we were in the car, i drilled him mercilessly on "what is seven plus eight", "what is eight plus seven", "what is fifteen minus seven", and "what is fifteen minus eight". He eventually relented and memorized the answers. He was, however, incredibly stubborn. For the record, his memory is astonishingly good. He can normally repeat anything i've said just once in the past year verbatim with minimal prompting.

But in 7th grade, i'd spend all weekend getting him caught up with his homework, and he refused to turn it in. The next weekend, i'd spend all weekend making him do it over. Old homework that was finished started appearing out of nowhere, but i still couldn't get him to turn it in.

I've just reviewed my own K-8 report cards. While i was always at least a little above average in math, in 8th grade all my grades jumped up permanantly. I recall nothing of the sort. My son seems to have had a maturity growth spurt at the end of 7th grade. I hope so.

Thursday, June 02, 2011

Autism and the Hypocondriac

Steve Silberman from Wired magazine wrote a seminal paper in something like 1993 about autism. It's definitely worth a read. It's not the last word on the subject. He knows this, otherwise, he wouldn't be writing a book on the subject now.

You should know that i'm an Engineer, and furthermore, one who has pursued a career in computer programming. Since i've done both, i'm qualified to say that computer programming is like engineering, only much, much more difficult. My ex wife has accused me of having autism. OK, so being obsessive compulsive is something one needs to be a halfway decent computer programmer. And since the computer does exactly what you tell it to, no matter how stupid that might be, having the attention to detail to spot the most minor of flaws is something one needs. These skills, and others, can be viewed as symptoms of autism. One expects that all such people must have at least some bits of the autism spectrum. Still, i thought the idea was silly. These skills can be learned.

The article above has a link to an autism test. The test itself is not diagnostic. That is, it won't tell you if you definitely have or do not have autism. But a control group who took the test averaged 16.4 - and 80% of those diagnosed with autism or a related disorder scored 32 or higher. My score was eleven. One of the common symptoms of autism is an inability to tell lies. It's not that the autistic person avoids untruth. Telling an untruth isn't thinkable. It doesn't make any sense. So, either this is a lie or it's not. Either way suggests i'm not autistic.

Wednesday, June 01, 2011

Forth for Enlightenment, part eight of ten, Backwords 4

In part six, a full runnable program was presented. But the startup time, generating the random puzzle, takes forever. How to speed it up? One thing to try is to abandon strings, which aren't randomly writable, and use arrays, which are. Characters aren't allowed as a data type in arrays. Only numbers are allowed. All we need are 9 different numbers. But some numbers are easier to convert to characters than others. In particular, the ASCII values of characters can be directly converted. The code for lower case "a" is 97, with the alphabet following sequentially. An array is created with the highly imaginitive name "a". Here's the code:


« 97 105 FOR x x NEXT { 9 } →ARRY 'a' STO » 'MKARRR' STO

The 97 105 FOR x starts a loop from ASCII values 'a' through 'i'. The next x pushes that value onto the stack. NEXT ends the loop. At the end of the loop, 9 values are on the stack. The list { 9 } is pushed, which is used by →ARRY to create an array on the stack of length 9, using the numeric values just pushed. This array is stored in the variable a. The whole function is called MKARRR.

Given the array, we need a function to shuffle it. But such a function will probably need a helper function that swaps values at two positions. It's a bit more complex. Two positions from 1 to 9 are placed on the stack before called.


«
DUP2 'a' SWAP 1 →LIST GET
SWAP 'a' SWAP 1 →LIST GET
ROT 1 →LIST 'a' SWAP ROT PUT
'a' SWAP ROT 1 →LIST SWAP PUT
» 'SWAPA' STO

You can see that each line ends with GET or PUT. These are the functions that get or put values into an array. Both GET and PUT require a list for the array index. that's because arrays can be two dimensional, and making the coordinates a single list allows the same number of arguments for any array dimensions. A stack track should help see what's going on. The function is passed two array indexes to swap, which are called x and y below.


















































































startxy    
DUP2xyxy  
'a'xyxy'a' 
SWAPxyx'a'y 
1xyx'a'y1
→LISTxyx'a'{ y } 
GETxyxyval  
 
SWAPxyyvalx  
'a'xyyvalx'a' 
SWAPxyyval'a'x 
1xyyval'a'x1
→LISTxyyval'a'{ x } 
GETxyyvalxval  
 
ROT xyvalxvaly  
1xyvalxvaly1 
→LISTxyvalxval{ y }  
'a'xyvalxval{ y }'a' 
SWAPxyvalxval'a'{ y } 
ROTxyval'a'{ y }xval 
PUTxyval    
 
'a'xyval'a'   
SWAPx'a'yval   
ROT'a'yvalx   
1'a'yvalx1  
→LIST'a'yval{ x }   
SWAP'a'{ x }yval   
PUT      

The stack is used for temporaries. It starts with DUP2, because both GET and PUT will consume an index for both coordinates. From there on out, it's all about goofing around with the stack until the arguments are in the right places. One can test this function by running MKARRR, entering two indexes on the stack, calling SWAPA, and looking at the a array for the results.

We need a new shuffle that uses this routine to create a puzzle.


«
1 9 FOR x
x x RAND * FLOOR 1 + SWAPA
NEXT
» 'SHFLA' STO

Clearly, 1 9 FOR x starts a loop that runs 9 times.
Two x's are put on the stack. RAND returns a random number between zero and 1. It's multiplied by the current x, the integer part is extracted with FLOOR. Then one is added with 1 +. This produces a number from 1 to x. Finally, that first x on the stack and the computed random number between 1 and x are swapped with SWAPA. So each value can get swapped to any other position. This whole function takes about 6 seconds to run on the HP-28C. While there probably are ways to make this faster still, six seconds is fast enough.

We still need a function which turns the array into a puzzle string. It's quite simple. The CHR function turns an ASCII number into a single character string. The '+' operator concatenates strings.


«
a ARRY→ DROP ""
1 9 START
SWAP CHR +
NEXT
» 'A2S' STO

The entire a array is put on the stack as a single object. ARRY→ converts that into the individual values with the list { 9 } at the top of the stack, which is then DROPped. A zero length string is pushed onto the stack so that it can be appended to. The second line starts the loop. An ASCII value is SWAPped to the top of the stack, converted to a one character string, and appended to the running string answer. And that's all there is too it.

Finally, a modification of 'BACK' is needed that uses the new shuffle.


« CLLCD MKARRR SHFLA A2S
1 'BMV' STO
WHILE DUP "abcdefghi" ≠ REPEAT
BMV 1 DISP 'BMV' 1 STO+ DUP 2 DISP
DO UNTIL KEY END STR→ REV
END
2 DISP
"You win" 3 DISP
'BMV' PURGE
'a' PURGE
» 'ABACK' STO

And that's it. This version is quite a bit larger, but also quite a bit faster. Usually, in space vs. time trade offs, the code size is ignored. In this case, the code size is nearly everything. Have a bit of fun with it.