Lisp in Forth

tosh | 155 points

On a related note, here's a Logo interpreter I implemented many years ago using a dialect of Forth: https://github.com/JohnEarnest/Mako/blob/master/demos/Loko/L...

RodgerTheGreat | 3 years ago

Lisp is Forth where "(" pushes next word onto separate execution stack and ")" executes it. Hence (+ 1 2) is equal to 1 2 +. I made this discovery maybe 1975, but Forth was not yet invented, or rather Internet was not yet invented, so I did not know about Forth. So it was a property of my very own Symbolic Stack Machine on Nova 1200.

timonoko | 3 years ago

This is an awesome project! I think writing a bootstrapping Lisp is probably one of the best uses for a Forth.

I was surprised that they said, "One of the more involved parts of this interpreter is the reader, where I had to do quite a lot of stack juggling to keep everything in line", and I think I can offer some useful pointers not only for the original author but also for anyone else who decides to go off and write stuff in Forth even though it's 02021.

As it happens, I coded up just Lisp READ and PRINT in Forth the other day, and I avoided doing any stack juggling at all: http://canonical.org/~kragen/sw/dev3/readprint.fs

My READ is 14 lines of Forth, and even accounting for the more horizontal (not to say cramped) layout of my code and its lack of symbol support, I think it's still significantly simpler and more readable than the 60 or so lines of Forth used here. Contrast:

    : lisp-skip-ws ( e a -- e a )
        lisp-read-char
        begin
            dup 0<> over lisp-is-ws and
        while
            drop lisp-read-char
        repeat
        0<> if
            lisp-unread-char
        endif ;
with (slightly reformatted)

    : wsp  begin  peek bl =  while  getc drop  repeat  ;
There are four simplifications here:

1. My definition of "whitespace" is just "equal to the space character BL". Arguably this is cheating, but it's a small difference.

2. I'm handling EOF with an exception inside PEEK, rather than an extra conditional case in every function that calls PEEK; this implies you have to discard whitespace before your tokens rather than after them, but that's what both versions are doing anyway.

3. I'm using a high-level word PEEK to represent the parser-level concept of "examine the next character without consuming it" rather than the implementation-level concept "dup 0<> over". This is facilitated by putting the state of the input stream into the VALUEs READP and READEND instead of trying to keep it on the stack, which would have given me a headache and wasted a lot of my time debugging. PEEK and GETC can always be called regardless of what's on the stack, while LISP-READ-CHAR only works at the beginning of an "expression".

4. The choice of the PEEK/GETC interface instead of GETC/UNGETC is also a very slight improvement. It would be less of a difference if LISP-UNREAD-CHAR were capable of unreading an EOF, but in general to the extent that you can design your internal interfaces to avoid making temporary side effects you must undo later, you will have less bugs from forgetting to undo them.

In other parts of the code the situation is somewhat worse. Consider the mental gymnastics needed to keep track of all the stack state in this word:

    : lisp-read-token ( e a -- e a a u )
        lisp-skip-ws
        0 >r
        lisp-read-char
        begin
            dup [char] ) <> over 0<> and over lisp-is-ws 0= and
        while
            token-buffer r@ + c! r> 1+ >r lisp-read-char
        repeat
        0<> if
            lisp-unread-char
        endif
        token-buffer r> ;
I didn't have a separate tokenizer except for READ-NUM, because all my other tokens were parentheses. But contrast:

    : (read-num) 0  begin  eod? if exit then
        peek [char] - =  if  -1 to (sign) getc drop
        else  peek isdigit  if  getc digit  else  exit  then  then  again ;
    \ That took me like half an hour to debug because I was confusing char
    \ and [char].
    : read-num 1 to (sign)  (read-num)  (sign) *  int2sex ;
Mine is not beautiful code by any stretch of the imagination. But contrast PEEK ISDIGIT IF GETC DIGIT ELSE EXIT THEN — in popular infix syntax, that would be if (isdigit(peek()) then digit(getc()) else return — with TOKEN-BUFFER R@ + C! R> 1+ >R LISP-READ-CHAR! Imagine all the mental effort needed to keep track of all those stack items! Avoid making things harder for yourself that way; as Kernighan and Plauger famously said, debugging is twice as hard as writing the code in the first place, so if you write the code as cleverly as you can, how will you ever debug it? You can define words to build up a token and write your token reader in terms of them:

    create token-buffer 128 allot  token-buffer value tokp
    : token-length  tokp token-buffer - ;
    : new-token  token-buffer to tokp ;  : token-char  tokp c!  tokp 1+ to tokp ;
Or, if you prefer (untested):

    create token-buffer 128 allot  variable token-length
    : new-token  0 token-length ! ;
    : token-char  token-buffer token-length @ + c!  1 token-length +! ;
Or similar variations. Either way, with this approach, you don't have to keep track of where your token buffer pointer (or length) is; it's always in tokp (or token-length), not sometimes on the top of stack and sometimes on the top of the return stack.

In this case the code doesn't get shorter (untested):

    : lisp-read-token ( e a -- e a )
        lisp-skip-ws
        new-token
        lisp-read-char
        begin
            dup [char] ) <> over 0<> and over lisp-is-ws 0= and
        while
            token-char lisp-read-char
        repeat
        0<> if
            lisp-unread-char
        endif ;
but it does get a lot simpler. You don't have to wonder what "0 >R" at the beginning of the word is for or decipher R@ + C! R> 1+ >R in the middle. You no longer have four items on the stack at the end of the word to confuse you when you're trying to understand lisp-read-token's caller. And now you can test TOKEN-CHAR interactively, which is helpful for making sure your stack effects are right so you don't have to debug stack-effect errors later on (this is an excerpt from an interactive Forth session):

    : token-type token-buffer token-length type ;  ok
    token-type  ok
    char x token-char token-type x ok
    char y token-char token-type xy ok
    bl token-char token-type xy  ok
    .s <0>  ok
    new-token token-type  ok
    char z token-char token-type z ok
This is an illustration of a general problem that afflicted me greatly in my first years in Forth: just because you can keep all your data on the stack (a two-stack machine is obviously able to emulate a Turing machine) doesn't mean you should. The operand stack is for expressions, not for variables. Use VARIABLEs. Or VALUEs, if you prefer. Divide your code into "statements" between which the stack is empty (except for whatever the caller is keeping there). Completely abjure stack operations except DROP: no SWAP, no OVER, and definitely no ROT, NIP, or TUCK. Not even DUP. Then, once your code is working, maaaybe go back and put one variable on the operand stack, with the appropriate stack ops. But only if it makes the code more readable and debuggable instead of less. And maaaybe another variable on the return stack, although keep in mind that this impedes factoring — any word you factor out of the word that does the return-stack manipulation will be denied access to that variable.

Think of things like SWAP and OVER as the data equivalents of a GO TO statement: they can shorten your code, sometimes even simplify it, but they can also tremendously impede understandability and debuggability. They easily create spaghetti dataflow.

Failure to observe this practice is responsible for most of the difficulty I had in my first several years of Forth, and also, I think, most of the difficulty schani reports, and maybe most of the difficulty most programmers have in Forth. If you can figure out how you would have written something in a pop infix language, you can write it mechanically in Forth without any stack juggling (except DROP). For example:

    v := f(x[i], y * 3);
    if (frob(v)) then (x[j], y) := warp(x[j], 37 - y);
becomes something like this, depending on the particular types of things:

    i x @  y c@ 3 *  f  v !
    v @ frob  if  j x @  37 y @ -  warp  j x !  y c!  then
Now, maybe you can do better than the mechanical translation of the infix syntax in a particular case — in this case, maybe it would be an improvement to rewrite "v ! v @" to "dup v !", or maybe not — but there's no need to do worse.

This is not to diminish schani's achievements with forthlisp, which remains wonderful! I haven't ever managed to write a Lisp in Forth myself, despite obviously feeling the temptation, just in C and Lua. Code that has already been written is far superior to code that does not exist.

But, if they choose to pursue it further, hopefully the fruits of my suffering outlined above will nourish them on their path, and anyone else who reads this.

kragen | 3 years ago

This is pretty cool especially since Forth is gaining interest in some circles.

bifrost | 3 years ago

Not this, but I wanted to make a Scheme VM and expose it directly, forth-like. A nice benefit would be a memory safe forth with advanced data types and GC.

EDIT: to clarify, the point of this of course is to write only the minimal, performance critical piece (the VM) in native code and build a complete Scheme environment on top of this. There has been many similar efforts (Scheme48, VSCM, etc), but AFAIK none of them exposed the VM directly.

FullyFunctional | 3 years ago

Forth Illustrated -- the August 1980 Byte Magazine cover:

https://archive.org/details/byte-magazine-1980-08

That's how it looks in my head, beveled stones and threads and stars and all.

DonHopkins | 3 years ago

This is funny -- my class project in Semantics of Programming Languages in '88 at Univeristy of Illinois was essentially "Forth in Lisp".

daladd | 3 years ago

Ahhhhhh. I was daydreaming this the other day

fao_ | 3 years ago

How fast is this?

asimjalis | 3 years ago

Something like this would make an excellent way to extend Collapse OS, if only it implemented GForth structs...

https://collapseos.org/

monistowl | 3 years ago