To summarize this post, I'm trying to generalize macros and quasiquotation in several ways all at once, but most of the hard problems I'm encountering center on an idea I call "higher quasiquotation":
- A quasiquotation is data shaped like an s-expression with s-expression-shaped holes.
- A quasiquotation of higher degree is data shaped like a quasiquotation with quasiquotation-shaped holes.
I already want to define macros whose bodies are not s-expressions but quasiquotations, so I already want to move up one degree on this progression. To make sure this design is correct, I want it to work even if I move up an arbitrary number of degrees. If it works, I expect one macro system to work for quasiquotation macros, regular macros, and reader macros, all in the same generalized way.
No, just like you can nest parentheses without spontaneously having quasiquotation, you can nest quasiquotations like you're talking about without reaching a higher degree of quasiquotation.
Suppose we have notations like so:
( ) parentheses
` , quasiquotation
^ $ the next higher degree of quasiquotation
(I would go on, but I'll run out of punctuation.)
S-expressions are bounded on both sides by single characters, and they nest within each other like this (labeling the layers as "a" and "b" and anything outside their lexical extent as "__"):
(a (b) a) --
If we want to look at just the "b" s-expression, it's simple to write that down on its own:
Quasiquotations are bounded on both sides by s-expressions, and they nest like this:
`(a `(b ,(a ,(--) a) (b) b) a ,(--) a (a) a) --
The meaning of "--" in the inner expressions hasn't changed: Every "--" is outside the lexical boundary of the `(a ...) quasiquotation, just as every "a" is outside the lexical extent of the `(b ...) quasiquotation. Occurrences of "--" can appear when the lexical extent closes on an s-expression (")") or a quasiquotation (","), and both of these are shown in the example.
We can isolate the "b" part pretty easily again, but we need to allow for holes in our data structure:
`(b ,(--) (b) b)
Quasiquotations of the next higher degree are bounded on both sides by quasiquotations, and they nest like this:
a ,(b (b) b) a ,(b) a `(a) a (a) a)
b ,(a) b `(b) b (b) b)
a ,(--) a `(a) a (a) a)
Occurrences of "--" can appear when the lexical extent closes on an s-expression (")"), a quasiquotation (","), or a quasiquotation of the next higher degree ("$"), and all of these are shown in the example.
This time, writing down just the "b" part gets tricky using s-expression-shaped syntax because one of the holes has orphaned sections inside (holes in the hole):
$`(-- ,(b (b) b) ,(b))
b ,(--) b `(b) b (b) b)
When we have more than one orphaned section in the same hole like we do here, we may need to use labels (or positions, or extra hole structure) to tell them apart so we can insert the "b" section into the "a" section deterministically. So far I haven't figured out how to represent this data in a way that works, let alone an elegant way.
For a concrete use case, look at it this way: When do we use ( and )? When our DSLs don't last all the way to the end of the file. When do we use ` and ,? When our DSLs have holes in them (although it might be unusual to hear this, because most Lisps couple these syntaxes together with a particular nested-list-generating DSL). When do we use ^ and $? When our DSLs have holes with holes in them. And so on, where higher degrees of quasiquotation have more and more intricate holes.
Let's say I'm writing a macro that implements a DSL where Common Lisp code and Arc code can be combined in the same function. (I'm going with Common Lisp so that we can't simply compile it to inline Racket code.) I may have Common Lisp s-expressions occurring under my Arc s-expressions and Arc s-expressions occurring under my Common Lisp s-expressions, but I want Arc variables to be visible in all the Arc parts and Common Lisp variables to be visible in all the Common Lisp parts. When we take a look at the lexical scope of any one local Common Lisp (or Arc) variable in that code, it's not simply shaped like an s-expression like traditional lexical scopes are; it has holes-with-holes wherever the Arc (respectively Common Lisp) expressions occur. So ^ and $ are a natural fit for the DSL:
Maybe I could actually use something like that ^ and $ syntax.
I'll need to generalize it to an infinite number of degrees:
Since Scheme's ,',',', technique doesn't generalize to other DSLs, I'll want to have at least one way to unquote from more than one level of nested quasiquotation at once:
Cene has more than one notation for doing that. (I can write a label on an outer layer, and then I can unquote all the way to a particular label.) I'd like to support various unquoting styles as macros, but maybe that macro system can expand to a notation like this, so getting this to work first seems best.
Finally, representing data structures with orphaned parts might be easy enough once I actually try using key-value tables to hold orphans like I've planned.
I think I like this approach even better than the one I reached in the blog post. It means that I actually can process an infinite number of degrees of quasiquotation "in the reader" rather than letting the top level of macroexpansion begin with an s-expression.
But I suppose this and the blog post tackle two different parts of the problem. The blog post's approach/challenges still apply to the process of parsing this syntax into an infinite-degree quasiquotation.
data HDExpr s m
= HDExprMedia (m (HDExprNonMedia s m))
data HDExprNonMedia s m
= HDExprHole s [Map s (HDExpr s m)]
| HDExprLayer (HDExpr s m) [Map s (HDExpr s m)]
It's not very self-documenting, so to start describing it, here's a Lispier pseudocode:
<expr> ::= <s-expression where some leaves are <paren>>
<paren> ::= (close <identifier> <list of <environment of <expr>>>)
<paren> ::= (open-and-close <expr> <list of <environment of <expr>>>)
I say "s-expression" there, but we could have any monad there for our syntax. If we work in the s-expression monad, the usual quasiquote operations ` , are parens of degree 1. When the monad we're working in is Haskell's (Writer String), our syntax is effectively reader syntax, where the ( ) parens are degree 1 and ` , are degree 2.
(There are also parens of degree 0, but they're a bit weird: Once they open, they only close again by reaching the end of the syntax. So when we're in the s-expression monad, perhaps ' is a paren of degree 0. When we're in the (Writer String) monad, perhaps pressing enter at the end of a REPL command is a paren of degree 0.)
The degree of the paren is reflected in the length of the list of environments it contains. Degree-0 parens have no environments, degree-1 parens have exactly one, and so on. If we want to represent a degree-N expression, then the parens we use are limited in a specific way:
<expr> ::= <s-expression where some leaves are <paren>>
<for some (M < N), an M-element list of
<environment of <expr>>)
<for some (M <= N), an M-element list of
<environment of <expr>>)>
The (open-and-close ...) syntax begins a nested expression. The environments serve to fill the holes in the <expr>, resuming the previous level of nesting. Each element of the list fills holes of a different degree. Holes of degree higher than the number of environments in the list are not filled; they remain holes. For instance, in `(a b (c d ,e) f), the "(" before "c" has two holes in its expression (",e" and ") f"), but it only fills the hole for ") f". The hole for ",e" is a higher degree than the "(" paren, so it's not filled at such a local place. It's only filled by the "`" paren on the outside.
The (close ...) syntax begins a hole of degree equal to the number of elements of the list. The list of environments will be used when the hole is replaced with an expression. It'll fill in the (low-degree) holes of that expression.
Note that the list lengths correspond with the degrees of holes, but not with the degrees of expressions. In fact, a degree-N expression does not contain any expressions of degrees other than N. If we consider ourselves to be working with higher quasiquotations of an infinite degree N, but every (close ...) actually occurring in our data has finite degree, then we can represent our data using finite lists. We never need to select a finite value for N!
In Haskell, the strongly typed variations I wrote do require a finite value for N to be decided beforehand (and baked into the name of the type), because I'm pretty sure that vastly simplifies the type system features I would need to use to get it to work. :-p Roughly, the difficulty is that I need 0 to give me something of kind ( * ), 1 to give me something of kind ( * -> * ), 2 to give me something of kind (( * -> * ) -> ( * -> * )), and so on, so I would need kinds that depend on values. Agda or Idris might already be up for this task, but I doubt that's going to be the kind of effort I want to spend since I intended to build my macroexpander in (untyped) Racket to begin with.
There might just be a little more I want to tinker with in Haskell, because on my way to defining this data structure I started to develop some code for "higher monads," and it would be fun if this data structure turned out to be an instance of that concept. Still, at this point I'm probably ready to go back to Racket.
I'm even hopeful again that this macro system might play nicely with Racket's. Racket has some hygiene features that walk recursively over the inputs and outputs of macros, expecting them to be s-expression-shaped with no holes. I wasn't sure that Racket's technique could be reconciled with higher quasiquotation. With this data structure, the exotic nesting of higher quasiquotations is converted to the traditional kind of nesting that Racket expects.
So, this might be a usable self-contained Racket library in the end -- not even a framework but a seamless library -- which is what I want, because that would make it easy to clarify the meaning of higher quasiquotation in terms of an existing ecosystem before I use it in Cene. :)
Since this thread and my code comments contain some walls of text, I'll see if I can convert them to a blog post soon.
 In particular, Racket uses a hygiene technique where it attaches a fresh scope label to a piece of syntax before macroexpanding and then flips the presence of that scope after macroexpanding so that the scope winds up attached to only the parts of the macro result that don't come from the input (making that region more local than the rest). Racket also has a "syntax taints" feature which lets macros attach "dye packs" to their results so that the identifiers occurring in those results can't be used by a client to access private module bindings.
Quite a while ago I silently made the decision to keep my opinions on language design to myself most of the time unless I had a language to show for it. :) Cene has come further along than any language I've made before. Part of that is because I learned some skills at work, and part of that is because I didn't want to bring it here until it was in a very polished state.
Cene has currently stalled a bit as I try to generalize the macro system to support quasiquotation macros, reader macros, and better error messages. This post touches on that topic a bit, particularly the error message part. In the 16 days since this post, I've tried to work on an implementation of this new macro system. My initial attempts were full of errors, and as I keep trying to organize my thoughts, I've learned a lot about Racket macros and category theory, but I still haven't found a way to implement the macro system. In a couple of days I'm planning to write another post about where I've gotten with that and what the difficulties are.
Arc is a pretty small codebase built on Racket, so I think if there's an answer to "Can Racket be used to write an OS?" then the Arc answer follows close behind.
I can imagine two distinct ways you might be interested in OS development, so I'll answer twice. :)
Personally, when I think about writing an OS, I'm motivated mainly by the prospect to design new ways that "applications" can be obtained, upgraded, and integrated with each other.
I would not want to develop low-level memory management code and device drivers. That would open up a can of worms in terms of security issues, not to mention the expense of keeping up to date on the latest hardware and the latest machine-code-based security topics. So my "OS" would ideally sit on top of another OS kernel.
For that kind of application, Racket could be great. Racket has tools for running computations in sandboxes with limited memory, so it should be practical to run a new kind of "application" this way as well. Looks like there's a way to run any X Window System client program as a kiosk (https://raspberrypi.stackexchange.com/questions/11866/how-ca...), which can probably be used to run a Racket GUI application. So it seems like all the tools exist to make a Racket program take the place of an OS desktop and manage its own applications.
Since Racket has these sandboxing and GUI features, technically Arc can invoke them too, since Arc can run Racket code. However, I don't think I've seen anyone use these features in Arc code yet, so it might be easiest to learn them in the context of writing a program in Racket itself.
On the other hand, if your goal is to do things that involve machine code, kernel privileges, and device drivers, you'll pretty much want a language where the run time semantics is very similar to machine code, with the ability to refer to specific memory addresses and the ability to initialize and manipulate its own call stacks. In many cases where you're using functionality specific to an architecture, such as booting up and initializing a first call stack, you'll need to write assembly code so that you know precisely what machine code is being generated.
There are many "C replacement" languages out there, but the one that I consider most approachable right now is Rust. It's easy to find a number of tutorials and examples of OS programming in Rust, and Rust is sort of a codification and streamlining of many techniques that have caught on in C, so the skills may be somewhat transferable between those languages.
In short: Racket (and hence Arc) has a lot of the high-level tools for making OS-like systems, but it's probably not good for the parts of an OS that require meticulous attention to machine code. I recommend something like Rust for those.
What OS skills or projects are you thinking of pursuing anyway? :)
I don't know if I'd ever want to use this 2D syntax directly, but I'll probably "use" it as a thought experiment for non-hierarchical syntaxes, along with cellular automata and spreadsheets. :)
Can you imagine an 'unquote-splicing operation for this tabuleau syntax? Between lists, 'unquote-splicing can insert any list into any other, but the cells of these tables have externally imposed sizes and shapes, so for an 'unquote-splicing in a table (if we wanted one at all), we'd probably want to enforce that the sizes and shapes match up in a way that makes sense. What might make more sense is to do 'unquote-splicing in a full row or column at once, because at least we get one dimension of freedom that way. If the splice takes up a full row, then it can splice in any number of rows (of any height) to that location.
I'm probably a little obsessed with quasiquotation lately. I've been trying to write an implementation of a macro system where the meanings of notations like 'unquote, 'unquote-splicing, and parentheses themselves can be treated as user-defined macros, and where syntax-bound concepts like source locations and syntax highlighting are mostly managed automatically in the macroexpander rather than something every macro must deal with (except for the macros that do something unusual with them).
Since I was delving into the Arc codebase and my Lathe codebase anyway, I made a bunch of fixes to Anarki, Anarki's stable branch, arc/nu's arc/3.1 language, and Rainbow.js today. It's enough to run a simple check of Lathe's module system on Windows, which meant I fixed a bunch of Windows bugs in particular! My Lathe sanity checks also worked fine on Rainbow, Jarc, ar, and the official Arc 3.1.
I understand, I think. I think it adds regularity, which is a benefit and drawback, like it is with Lisp's parentheses. But I got used to Arc's (a:b c) at a time when I didn't need a strong reason; the fact that someone put it in a language was enough to convince me it was worth a try, and by "try" I mean using it everywhere I could. :)
This Parendown syntax tackles the a lot of the same needs as quotation sugars, right-associative infix operators, multi-branch conditionals, and imperative blocks. Those individualized pleasantries no longer need to exist as much, but some of what was pleasant about them might slip through the cracks and be neglected entirely by the more regulated approach. Lisp's parentheses didn't solve the things this does, and this doesn't solve some things that other syntaxes do (such as left-associative infix operators, perhaps).
Programming this way in Cene for a while, one thing I keep wanting to reintroduce is quotation sugar. But that's probably unrelated; Cene has a more elaborate syntax for quotation to help with programs that involve several nested quotes, and since I only actually use it in shallow cases, it's a bit hard to explain why I've made it so elaborate. :-p A sugar would brush some of that complexity under the rug until it's useful.
I developed Parendown over the last couple of days as my first foray into Racket #lang extension.
This is a syntactic feature I've had in the Era languages for a long time, and it has such a drastic effect on the language's appearance that I always feel like I need to go out of my way to explain it. It's just a simple s-expression sugar inspired by Arc's a:b ssyntax, but it's so handy.
Er, I upvoted your "sorry" just now because I did find it interesting that those archive links exist. :) I guess the Racket project doesn't have such tireless devotion to documentation that they maintain active versions of the docs for old software releases, but it's nice that a snapshot is up somewhere.