Monday, April 28, 2008

Forth Extension Language For Emacs

Stevey has a rant about Emacs. It's not the usual Emacs vs. Vi. or even Vim. It's about variants of Emacs. Emacs and the xemacs fork. There are hundreds of other emacsen filling various ecological niches out there.

And it got me to post a rant in the comments. Big enough to drive away some more readers from this blog had i posted it here. But it got me to thinking about TECO. Raise you're hand if you remeber TECO. That many? Wow.

Well, there are a couple versions of TECO around now. I've installed one on my desktop, and also on my shirt pocket computer. I found that though i remembered enough to do simple editing - insert, delete, copy, paste, change all of this to that, i'd forgotten lots of the stuff that made it such a powerful editor. Well, a quarter of a century of disuse will do that.

You see, TECO isn't just a Text Editor and COrector, it's a Turing complete language. It would be natural to represent the Turing tape with characters in the edit buffer. On today's gigabyte machines, the tape really is essentially infinitely long. And, for the most part, TECO was really fast. And, mind you, the machines it ran on were really, really slow by today's standards. Imagine if it took Windows 100,000 times longer to boot. TECO was fast on such machines.

But today's Emacs uses Lisp as an extension language. And it seems pretty fast, except that my benchmarks show it to be 500 times slower than C on various machines. It's only really fast compared to how it used to run on smaller, slower boxes. Why is that?

Well, for some reason, Lisp is compiled to a byte code langauge. There's a 3x to 5x performance penalty for byte code interpretation. And, unlike Java, the byte code is not usually written to disk. So, it's write once, compile everywhere. It could be compiled to native code. But 5x is not 500x. Where does that come in? My guess is memory management. But it's just a guess.

TECO isn't compiled to byte code. The commands are one or two bytes long. The commands themselves are interpreted. There aren't very many of them, and interpretation is very fast. And, for some reason, there is no garbage collection. At least none that you'd notice.

TECO is a stack language. So it should be comparible to Forth. Where TECO has a small fixed number of variables, beyond which you can't go, Forth allows the creation of an arbitrarily large number of new objects. Neither language has garbage collection, as near as i can tell. Yet stack languages are reverse polish notation, and Lisp (and friends) are polish notation. The one can be converted to the other mechanically. So, it's a mystery why Lisp has garbage collection and Forth does not.

Now, i doubt that anyone wants to go back to TECO as the extension language for Emacs. I've found Forth a much lower barrier to entry language than Lisp. So, perhaps Forth is a reasonable choice.

No comments: