Arc Forum | After Pauan's comment at http://arclanguage.org/item?id=17449, I think I've figu...

Arc Forum

4 points by rocketnia 4499 days ago | link | parent

After Pauan's comment at http://arclanguage.org/item?id=17449, I think I've figured out a convoluted but surprisingly comprehensive approach Arcueid could take. This would support (most) existing Arc code, including a programming style that still uses Arc's unhygienic macros, while also supporting first-class namespaces and hygienic macros using Pauan's get-variable-box recommendation.

A programmer would not observe it to have a hyper-static global environment, because neither does Arc! Nevertheless, a hyper-static environment could be supported as a compiler option, in the sense of having alternative versions of 'eval, 'load, and the REPL. I think the important part of the hyper-static discussion was the use of first-class namespaces.

What dido's been talking about is a system like Common Lisp's or Clojure's. I haven't used either system firsthand, but it seems they both transform all unqualified symbols in a file by associating them with an implicit prefix, and both languages use unhygienic macros and support large community projects.

In response to this, Pauan was saying something about implementing first-class namespaces by using symbol replacement tables. What I'm about to describe is a spin on that: We use first-class namespaces with Pauan's notion of boxes, as I currently understand it, and we also use symbol replacement tables to stand in for CL-like symbol qualification.

I'm actually going to call these concepts "first-class (global) environments," "compilation boxes," and "(symbol) replacement rules," because otherwise I'd confuse myself. We essentially have two notions of first-class namespace at the same time, a compilation box won't be just any generic kind of box, and I'll actually represent symbol replacement rules as functions, not tables.

---

Okay, here's a comprhehensive overview of the command loop:

1) Read a command as though by using 'read. The result can contain some symbols of the form "Moral::sin", and that's not a problem.

2) Code-walk over the command s-expression. For each symbol:

2a) If the symbol does not contain "::", walk over its non-ssyntax segments and replace them according to the current replacement rule. For instance, the symbol "sin.x" could become "ns/math/1-sin.ns/example/1-x". (The "1" is here in case we want to load the same file multiple times.)

2b) If the symbol begins with a non-ssyntax string followed by "::", look up that string using the current replacement rule and the current global environment, and use that value as the current replacement rule for the rest of the symbol. For instance, the symbol "Math::sin.pi" could become "ns/math/1-sin.ns/math/1-pi", and "Foo::Bar::abc:def" could first look up "ns/foo/1-Bar" and then become "ns/bar/1-abc:ns/bar/1-def".

2c) If the symbol contains "::" but has ssyntax before that, report an error. The precedence is unclear, and it's pretty much unimportant, since the programmer can almost always write out their code without using ssyntax at all.

3) Macroexpand the resulting s-expression using the usual Arc semantics. If programmers inspect the s-expressions they're manipulating here, they'll see things of the form "ns/bar/1-abc:ns/bar/1-def", but that's fine.

4) Compile the results of macroexpansion--possibly as the macroexpander goes along, as Arc 3.1 does internally.

4a) When compiling a (get-variable-box ...) form or global variable reference, look up that name in the current global environment. If it doesn't exist, create a new compilation box with a unique ID, and entirely replace the current global environment with another environment that has this binding. The compilation box's ID isn't necessarily related to symbols; it could be a numeric index or a JavaScript identifier, or whatever else the compiler will need.

4b) When compiling a literal compilation box or a global variable reference, use the compilation box's ID as part of the compilation result.

5) Execute the compiled code.

---

As usual, the programmer gets to run code during macroexpansion and execution. At this point, the programmer needs access to several low-level builtins in order to take full advantage of the system. Here are some reasonable builtins we could provide:

- Create and manipulate first-class environments. An environment can just be (a function that takes (a symbol) and returns (a zero- or one-element list where the element is (a compilation box)). This representation offers no way to view the set of bound variables, but neither does Arc 3.1.

- Get, set, or dynamically bind the global environment.

- Create and manipulate a symbol replacement rule. A symbol replacement rule can be just be (a function that takes (a non-ssyntax symbol) and returns (a non-ssyntax symbol)).

- Get, set, or dynamically bind the current symbol replacement rule.

- Dynamically evaluate an s-expression using a particular global environment and a particular symbol replacement rule, and return three things: The final global environment, the final symbol replacement rule, and the evaluated result.

---

Sometimes (in parts 2a and 2b of the command loop), replacement symbols will be inserted in the midst of a single ssyntax symbol. When the replacements are interned symbols, we can just use string manipulation for this. But what if they're uninterned? We have a few different options for dealing with this:

1) It just won't happen, because all symbols in the language are interned. Even the gensyms returned by (uniq) are interned. This is true in Arc 3.1, Jarc, and Rainbow, but Anarki "fixes" it.

2) Uninterned symbols exist, but symbol replacement rules must always return interned symbols anyway, just so that they can take part in ssyntax.

3) Replacement symbols can be uninterned, but there's a dynamic error if these replacements would end up taking part in ssyntax.

4) Interned and uninterned symbols exist, but there's also a third category of symbols with some rigid nested structure. Given an interspersed list of symbols and ssyntax operators, we can construct a symbol that will execute that ssyntax properly, even if some of the original symbols are gensyms.

5) Interned and uninterned symbols exist, and every uninterned symbol is associated with an arbitrary ssexpansion result (usually itself). (In Racket versions of Arc, a weak table would suffice to implement this.) If we try to insert uninterned symbols into ssyntax, instead we make a fictional ssexpansion and create a new uninterned symbol that will ssexpand to that.

My favorite options are 3 and 5. It would be nifty to see 5 in other versions of Arc, even if it somehow breaks assumptions I made in my old code.

---

I have to continue this in a separate post. This is the first time I've seen "That comment is too long." XD

4 points by rocketnia 4499 days ago | link

Now to build a high-level import system on top of this, the kind dido is looking for. I'll assume we're restricted to interned symbols.

Just before loading arc.arc, the initial symbol replacement rule is the identity function, and the initial environment contains only the builtins. Each builtin value is located in its own compilation box (no sharing!) and filed under its familiar name as an interned symbol. For instance, the + procedure is filed under the "+" symbol, with no prefixing.

Just before we go to the REPL or the main program file, we take a snapshot of the current global environment and the current symbol replacement rule.

Suppose we want (import foo) to do an unqualified import of everything in foo.arc, including the things it depends on. This will be very much like (load "foo.arc"), but foo.arc will see only the core utilities, not our additional definitions. Here are the steps:

1) Create a new global environment based on the core snapshot.

2) Generate a unique symbol prefix like "ns/foo/1-".

3) Create a new symbol replacement rule which checks whether or not the symbol exists in the core global environment. If it does, it's returned as-is. Otherwise, the prefix is attached.

4) Process each command in foo.arc as described above, using the created environment and replacement rule as context. Then come back to our original context.

5) Replace our current global environment with a function that tries the final foo.arc environment first and our preexisting environment second. (This lookup won't affect run time efficiency, since we use compilation boxes.)

6) Replace our current symbol replacement rule with a function that checks whether or not the symbol exists in the final foo.arc environment. If it does, the function defers to the final foo.arc replacement rule. Otherwise, it defers to our preexisting replacement rule. Now, if we write "bar" and foo.arc defines bar, it'll rewrite to "ns/foo/1-bar", which is part of our new environment.

If we want to do a qualified import (import-as foo Foo) instead, then step 6 is different:

6) Replace our current global environment again so that "Foo" maps to the final foo.arc symbol replacement rule. Now, if we write "Foo::bar" and foo.arc defines bar, it'll rewrite to "ns/foo/1-bar", which is part of our new environment.

---

Whew! I know this is too much to expect anyone to read and understand it all at once, let alone to implement all at once. ^_^

Here are some things to consider, dido, if you generally like this synthesis of ideas but want to spin it up or simplify it:

In the (import ...) mechanisms I described, we filter the symbol replacement rule by querying the environment, so this mechanism actually requires both notions of first-class namespace. If we provide some other way to filter, such letting each file build its own list of export symbols, then replacement rules could exist as the sole namespace system.

On the other hand, if we use first-class environments, we can simply bind "bar" in our namespace to a compilation box we get from foo.arc, and we don't need to fiddle with replacements like "ns/foo/1-bar" and the ssyntax issues. I think there's much more conceptual simplicity to this approach. Unfortunately, it demands the use of hygienic macros, which undermines Arc compatibility.

-----