"One of the beauties of lisp is that the prefix notation is simple, making the grammar almost trivial to parse. Bullet's reader has to work a little harder, needing one token of look-ahead to be able to handle its infix operators.."
I chose to make the tokenizer oblivious to infix syntax. The only delimiters it recognizes are quotes, unquote (comma), parens and whitespace. The infix operators[1] are implemented in a later phase; it can transform each token without needing any context around it.
Hmm, when I look at it now I realize that I could do this earlier in the pipeline. I could do this transformation before I construct the cons cells for expressions, and it'd probably be more efficient.
I'd love to chat more about the implementation details since it seems we've dealt with a lot of the same problems. Talking to you about it has already given me two ideas for improvements, thanks.
[1] Arc calls them ssyntax; I think the name is intended to line up with s-expressions.
1. line processor: text lines are surrounded by brackets (textually!) where appropriated (i.e. 2-indents subject to nesting rules). Infix operators aren't touched. Result is stored in an array of CodeFragment, which is basically text + source code location.
2. reader: the forms are "read" and turned into B-expressions. You could call this the parsing stage, but lisp is a bit strange in some sense: there are two levels of parsing: the reader is the first, and the second is in the...
3. interpreter/compiler: every lisp has a starting phase which takes the "low-level" S-expressions comprising the code and parses it into the true semantic expression of the language: e.g. IfExpr, LambdaExpr, FnApplyExpr, etc.
Indeed, I think this is what makes lisp unique: all other languages have parsers which take you directly to step 3. The AST for semantic analysis purposes trees is basically just the parse tree. In lisp, you have to do work to get the AST -- but that's also why you get flexibility in macro writing.
I'd be happy to exchange implementation details. As I mention in the article, my code resides at