After Pauan's comment at http://arclanguage.org/item?id=17449, I think I've figured out a convoluted but surprisingly comprehensive approach Arcueid could take. This would support (most) existing Arc code, including a programming style that still uses Arc's unhygienic macros, while also supporting first-class namespaces and hygienic macros using Pauan's get-variable-box recommendation.
A programmer would not observe it to have a hyper-static global environment, because neither does Arc! Nevertheless, a hyper-static environment could be supported as a compiler option, in the sense of having alternative versions of 'eval, 'load, and the REPL. I think the important part of the hyper-static discussion was the use of first-class namespaces.
What dido's been talking about is a system like Common Lisp's or Clojure's. I haven't used either system firsthand, but it seems they both transform all unqualified symbols in a file by associating them with an implicit prefix, and both languages use unhygienic macros and support large community projects.
In response to this, Pauan was saying something about implementing first-class namespaces by using symbol replacement tables. What I'm about to describe is a spin on that: We use first-class namespaces with Pauan's notion of boxes, as I currently understand it, and we also use symbol replacement tables to stand in for CL-like symbol qualification.
I'm actually going to call these concepts "first-class (global) environments," "compilation boxes," and "(symbol) replacement rules," because otherwise I'd confuse myself. We essentially have two notions of first-class namespace at the same time, a compilation box won't be just any generic kind of box, and I'll actually represent symbol replacement rules as functions, not tables.
Okay, here's a comprhehensive overview of the command loop:
1) Read a command as though by using 'read. The result can contain some symbols of the form "Moral::sin", and that's not a problem.
2) Code-walk over the command s-expression. For each symbol:
2a) If the symbol does not contain "::", walk over its non-ssyntax segments and replace them according to the current replacement rule. For instance, the symbol "sin.x" could become "ns/math/1-sin.ns/example/1-x". (The "1" is here in case we want to load the same file multiple times.)
2b) If the symbol begins with a non-ssyntax string followed by "::", look up that string using the current replacement rule and the current global environment, and use that value as the current replacement rule for the rest of the symbol. For instance, the symbol "Math::sin.pi" could become "ns/math/1-sin.ns/math/1-pi", and "Foo::Bar::abc:def" could first look up "ns/foo/1-Bar" and then become "ns/bar/1-abc:ns/bar/1-def".
2c) If the symbol contains "::" but has ssyntax before that, report an error. The precedence is unclear, and it's pretty much unimportant, since the programmer can almost always write out their code without using ssyntax at all.
3) Macroexpand the resulting s-expression using the usual Arc semantics. If programmers inspect the s-expressions they're manipulating here, they'll see things of the form "ns/bar/1-abc:ns/bar/1-def", but that's fine.
4) Compile the results of macroexpansion--possibly as the macroexpander goes along, as Arc 3.1 does internally.
4b) When compiling a literal compilation box or a global variable reference, use the compilation box's ID as part of the compilation result.
5) Execute the compiled code.
As usual, the programmer gets to run code during macroexpansion and execution. At this point, the programmer needs access to several low-level builtins in order to take full advantage of the system. Here are some reasonable builtins we could provide:
- Create and manipulate first-class environments. An environment can just be (a function that takes (a symbol) and returns (a zero- or one-element list where the element is (a compilation box)). This representation offers no way to view the set of bound variables, but neither does Arc 3.1.
- Get, set, or dynamically bind the global environment.
- Create and manipulate a symbol replacement rule. A symbol replacement rule can be just be (a function that takes (a non-ssyntax symbol) and returns (a non-ssyntax symbol)).
- Get, set, or dynamically bind the current symbol replacement rule.
- Dynamically evaluate an s-expression using a particular global environment and a particular symbol replacement rule, and return three things: The final global environment, the final symbol replacement rule, and the evaluated result.
Sometimes (in parts 2a and 2b of the command loop), replacement symbols will be inserted in the midst of a single ssyntax symbol. When the replacements are interned symbols, we can just use string manipulation for this. But what if they're uninterned? We have a few different options for dealing with this:
1) It just won't happen, because all symbols in the language are interned. Even the gensyms returned by (uniq) are interned. This is true in Arc 3.1, Jarc, and Rainbow, but Anarki "fixes" it.
2) Uninterned symbols exist, but symbol replacement rules must always return interned symbols anyway, just so that they can take part in ssyntax.
3) Replacement symbols can be uninterned, but there's a dynamic error if these replacements would end up taking part in ssyntax.
4) Interned and uninterned symbols exist, but there's also a third category of symbols with some rigid nested structure. Given an interspersed list of symbols and ssyntax operators, we can construct a symbol that will execute that ssyntax properly, even if some of the original symbols are gensyms.
5) Interned and uninterned symbols exist, and every uninterned symbol is associated with an arbitrary ssexpansion result (usually itself). (In Racket versions of Arc, a weak table would suffice to implement this.) If we try to insert uninterned symbols into ssyntax, instead we make a fictional ssexpansion and create a new uninterned symbol that will ssexpand to that.
My favorite options are 3 and 5. It would be nifty to see 5 in other versions of Arc, even if it somehow breaks assumptions I made in my old code.
I have to continue this in a separate post. This is the first time I've seen "That comment is too long." XD
Now to build a high-level import system on top of this, the kind dido is looking for. I'll assume we're restricted to interned symbols.
Just before loading arc.arc, the initial symbol replacement rule is the identity function, and the initial environment contains only the builtins. Each builtin value is located in its own compilation box (no sharing!) and filed under its familiar name as an interned symbol. For instance, the + procedure is filed under the "+" symbol, with no prefixing.
Just before we go to the REPL or the main program file, we take a snapshot of the current global environment and the current symbol replacement rule.
Suppose we want (import foo) to do an unqualified import of everything in foo.arc, including the things it depends on. This will be very much like (load "foo.arc"), but foo.arc will see only the core utilities, not our additional definitions. Here are the steps:
1) Create a new global environment based on the core snapshot.
2) Generate a unique symbol prefix like "ns/foo/1-".
3) Create a new symbol replacement rule which checks whether or not the symbol exists in the core global environment. If it does, it's returned as-is. Otherwise, the prefix is attached.
4) Process each command in foo.arc as described above, using the created environment and replacement rule as context. Then come back to our original context.
5) Replace our current global environment with a function that tries the final foo.arc environment first and our preexisting environment second. (This lookup won't affect run time efficiency, since we use compilation boxes.)
6) Replace our current symbol replacement rule with a function that checks whether or not the symbol exists in the final foo.arc environment. If it does, the function defers to the final foo.arc replacement rule. Otherwise, it defers to our preexisting replacement rule. Now, if we write "bar" and foo.arc defines bar, it'll rewrite to "ns/foo/1-bar", which is part of our new environment.
If we want to do a qualified import (import-as foo Foo) instead, then step 6 is different:
6) Replace our current global environment again so that "Foo" maps to the final foo.arc symbol replacement rule. Now, if we write "Foo::bar" and foo.arc defines bar, it'll rewrite to "ns/foo/1-bar", which is part of our new environment.
Whew! I know this is too much to expect anyone to read and understand it all at once, let alone to implement all at once. ^_^
Here are some things to consider, dido, if you generally like this synthesis of ideas but want to spin it up or simplify it:
In the (import ...) mechanisms I described, we filter the symbol replacement rule by querying the environment, so this mechanism actually requires both notions of first-class namespace. If we provide some other way to filter, such letting each file build its own list of export symbols, then replacement rules could exist as the sole namespace system.
On the other hand, if we use first-class environments, we can simply bind "bar" in our namespace to a compilation box we get from foo.arc, and we don't need to fiddle with replacements like "ns/foo/1-bar" and the ssyntax issues. I think there's much more conceptual simplicity to this approach. Unfortunately, it demands the use of hygienic macros, which undermines Arc compatibility.
"Things unique to the Arcueid implementation such as the marshalling mechanism (CIEL), the Limbo-inspired thread communication and synchronisation primitives, and so forth, must all go into their own modules to limit pollution of the global namespace with incompatible symbols as far as possible."
One of the first things I did in Arc was making a namespace system so I could pile on additional libraries (Lathe) without regret. :)
In my Lathe namespace system, I used prefix-like macros to rename variables into gensyms:
my.foo --> gs123-foo
my!foo --> 'gs123-foo
(my:foo:my:bar a b c) --> (gs123-foo (gs123-bar a b c))
Nothing but a global variable can expand as a macro in (typical) Arc, so these gensyms are the way I avoided conflict in the global namespace.
The approach currently present in Lathe has a few disadvantages as a "standard" module system for Arc:
- It's still a bit unhygienic since the prefix-like macros themselves (e.g. 'my) are kept as unmanaged global variables. I've often considered going back and standardizing on just two prefixes: my.foo and yr.foo.
- There are several features I built into the module system at the start and then never used. For instance, if you want to, you can remove a module from the module cache so that it's reloaded the next time someone requires it. If I went back to this system to start from scratch, I wouldn't bother with these features.
- I was never quite satisfied with a way to manage global tables like 'setter or uses of 'extend. In fact, I'm still pursuing a solution to these issues, but it's taking me far outside what's easy to do from within Arc (or any other language I know of).
"I was never quite satisfied with a way to manage global tables like 'setter or uses of 'extend. In fact, I'm still pursuing a solution to these issues, but it's taking me far outside what's easy to do from within Arc (or any other language I know of)."
For what it's worth, Nulan also has that problem with the "syntax-rules" object. The way I decided to solve it is to provide a macro called "w/dict!":
(var foo (obj a 1 b 2))
(= foo!c 5))
foo!c -> nil
Basically, any changes made to the object inside the w/dict! are not seen outside the w/dict!
So in Nulan, if a file creates some new syntax, and you want to load the file but not the syntax, you can use `w/dict! syntax-rules ...`
But, knowing you, you probably meant having these kind of object dependencies automatically detected. I don't have any good ideas for that, sorry.
Well, my answer to the module problem is to just use hyper-static scope. Vastly simpler and vastly faster while providing all the benefits of a solid module system.
As far as Arc is concerned, there would need to be a couple changes:
1) A distinction between creating a new variable and assigning to an existing one. In other words, like Scheme's define vs set!
Personally, I would use (var foo 1) to mean "create a new variable foo in the current scope" and (= foo 1) to mean "assign to the already existing variable foo"
2) Mutually recursive functions would be a bit trickier, but I already solved that problem in Nulan by introducing a new macro called "defs". Conveniently, Arc already has a "defs" macro, but it's pretty much unused. It could be easily repurposed for mutually recursive functions.
Once again, though, both these changes break compatibility with Arc 3.1. But I think they would be very good changes to make. In fact, I think they're so good, I would make Arc/Nu hyper-static, if I still programmed in Arc.
If you like, I can go into more detail about the benefits/drawbacks of hyper-static scope, and also give some details about one way to implement it.
I just realized: it's possible to add in hyper-static scope to Arc while retaining full backwards compatibility. (Crazy, no?)
Here's how you do it. The definition of "=" is the same: if the variable exists, mutate it, otherwise create a new variable.
But now you add in a new primitive called "var", which always creates a new variable, even if it already exists.
Existing Arc code uses "=", so it will get the normal dynamic vars, but new code can use "var" to get hyper-static scope. And the two play nicely together, which lets you intermix dynamic/hyper-static scope as much as you want.
Oh yeah, another thing... this whole "var" thing actually allows for mutually recursive functions without using "defs", so I guess it's the ultimate compromise: all the namespace benefits of hyper-static scope with all the conveniences of dynamic scope.
The way to implement this is easy. You have a hash table (or similar) at compile-time (per scope) that maps symbols to symbols. With this Arc program...
(= foo 1)
...the hash table looks like this:
foo -> foo
Not very exciting: it simply maps the symbol "foo" to itself. But now let's use "var":
(var foo 1)
Now the hash table looks like this:
foo -> foo2
foo2 -> foo2
Which means that whenever the compiler sees the symbol "foo", it will replace it with "foo2". So that means that this Arc program...
(var bar 1)
(var bar 1)
(var bar 1)
...will get replaced at compile-time with this:
(= bar 1)
(= bar2 1)
(= bar3 1)
The only issue is handling "var" inside functions, like this:
(var bar 1))
In that case, the "var" is local to the function, similar to using "let". But I'm sure you've already got all that worked out, since function arguments work.
In any case, you probably aren't yet sure why hyper-static scope is so useful. If you want, I'd be happy to go into more details about it.
"I just realized: it's possible to add in hyper-static scope to Arc while retaining full backwards compatibility . (Crazy, no?)"
I think you forgot the main reason why backwards-compatibility isn't very feasible: Macros.
; ===== util.file ====================================================
(fn (func . seqs)
; Lets you write (map x seq (+ 1 x)) in place of
; (fun-map (fn (x) (+ 1 x)) seq).
(mc (x seq . body)
; ===== this-is-fuuun.file ===========================================
(var fun-map (list " | O | X"
"X | X | O"
" | X | "))
(pr (map line fun-map (+ line "\n")))
Arc macros behave as though macroexpansion were simply about constructing some lists of symbols. But we really want each macro-inserted symbol to be looked up in that macro's lexical environment.
In Penknife, I handled macros by taking advantage of an existing assumption I was making about modules: Assume there are no side effects during the loading of the program, so that we can record the macroexpansion results to a file as a precompiled program without corrupting the program's behavior. Then any code that had a macro in scope at compile time will have a doppelganger of that macro in scope at run time. Whenever we encounter a variable during program execution, we can resolve it by looking up a macro value, accessing its lexical environment, and repeating until the original variable binding is in scope.
Penknife didn't really embrace the hyper-static global environment, but it would have been built upon the same sort of basis: Each file would have started in its own fresh environment, and some commands (namely, imports) would have worked by replacing the current environment.
"The definition of "=" is the same: if the variable exists, mutate it, otherwise create a new variable."
The behavior I'd use is that any compile-time variable access (even under a lambda) creates a new, uninitialized variable binding if a binding doesn't already exist.
; Create bindings for 'even and 'odd, then set the value of 'even.
(= even (fn (x) (case x 0 t (~odd:- x 1))))
; Set the value of 'odd.
(= odd (fn (x) (case x 1 t (~even:- x 1))))
If you wait to create the variable bindings until assignment time, then even's reference to "odd" is initially unbound, and you have to somehow associate it with the binding of 'odd created in the second line.
"I think you forgot the main reason why backwards-compatibility isn't very feasible: Macros."
I didn't forget: Nulan completely solved the macro hygiene problem after all. But that's a more extensive change so I figured I'd save it for after the basic hyper-static scope system is in place.
In fact, assuming Arcueid does implement my proposal, I would actually go in and make a new version of arc.arc that uses "var" and has hygienic macros by default. Then you could simply load up the new arc.arc to get all the shininess. But loading the old arc.arc would have full compat with existing Arc 3.1 programs.
"I think you've been resolving this by writing macros so that the procedures are inserted directly into the macro result, rather than referred to indirectly by symbols. Right?"
Nope. Macro hygiene in Nulan just uses the already existing box implementation. It's really easy, really simple, and really fast. Seriously, boxes are awesome. No need to complicate things.
The way to solve it in Arc: just provide a function called "get-variable-box" which is only available at compile-time and it returns the box for the variable.
Then you change quasiquote so it uses "get-variable-box" rather than inserting the symbol directly. Bam, hygienic macros with no additional runtime cost, and extremely small compile-time cost. And they look and feel just like Arc macros, so you don't lose any power or convenience. No clunky Scheme macros, huzzah!
Once I understood that the fundamental problem was with dynamic scope, and the best way to solve it is with boxes (or similar), everything became super easy and awesome.
"The behavior I'd use is that any compile-time variable access (even under a lambda) creates a new, uninitialized variable binding if a binding doesn't already exist."
Yeah I'd do that too, if I wanted to graft dynamic variables onto a hyper-static system. But since Arc uses dynamic variables, I proposed to graft hyper-static onto it instead.
"But that's a more extensive change so I figured I'd save it for after the basic hyper-static scope system is in place."
The middle ground doesn't seem worthwhile to me. When programmers work with with Arc-style unhygienic macros, at each use site, the variables in scope must (mostly) match the variables the macro author expected. So I think people who like using macros will be happiest if they systematically keep the variable names consistent across all the code in their program (even others' code), at which point namespace mechanisms just get in the way.
"Nope. Macro hygiene in Nulan just uses the already existing box implementation . It's really easy, really simple, and really fast. Seriously, boxes are awesome. No need to complicate things."
I think you caught me on a technicality. :) I see "procedures are inserted directly into the macro result" as a general approach. Mutable boxes make it possible for this approach to achieve late binding. Elsewhere in this discussion you tilt the technicality closer to my phrasing, since you recommend to let users build boxes out of getter and setter procedures.
Anyway, I'm a fan of that approach when it works, but it doesn't work so well when compilation is involved: The macroexpanded code contains unserializable values--namely, the procedures or boxes we're talking about. This is a lesson I learned with Penknife, where I at first had macros insert boxes, and then had to reengineer this so macros inserted step-by-step treasure maps for how to find a variable from the run time environment.
"Yeah I'd do that too, if I wanted to graft dynamic variables onto a hyper-static system. But since Arc uses dynamic variables, I proposed to graft hyper-static onto it instead."
How do you make the even/odd code work? Under the approach you described, the first line refers to an undefined variable (odd), and I interpret that as an error. I was recommending a fix.
"The middle ground doesn't seem worthwhile to me."
Retaining Arc compatibility in general doesn't seem worthwhile to me, but a lot of people want it, so I gave a system that retains Arc compatibility while tacking on some new shininess. Nulan doesn't have to worry about Arc compatibility, so it has pure hyper-static scope and hygienic macros by default.
"How do you make the even/odd code work? Under the approach you described, the first line refers to an undefined variable (odd), and I interpret that as an error. I was recommending a fix."
Easy: I have a macro called "defs" that handles mutual recursion:
(if (is x 0)
(odd (- x 1)))
(if (is x 0)
(even (- x 1))))
The above macroexpands into this:
(= even (fn (x)
(if (is x 0)
(odd (- x 1)))))
(= odd (fn (x)
(if (is x 0)
(even (- x 1)))))
Basically, it first creates new boxes, and then it assigns the functions to the boxes. This is one of a few reasons why I prefer mutable boxes over immutable boxes. Though, you could probably have "defs" expand to a Y-combinator instead, if you really wanted immutability...
"I think you caught me on a technicality. :)"
Maybe it was just a simple miscommunication. What you were talking about sounded exactly like the technique of splicing function values using macros:
What I'm talking about happens entirely at compile-time using boxes. The effect is very similar, but the implementation is very different.
"Anyway, I'm a fan of that approach when it works, but it doesn't work so well when compilation is involved: The macroexpanded code contains unserializable values--namely, the procedures or boxes we're talking about"
Also, what's the point in serializing boxes since functions still can't be serialized? If you found a way to serialize functions, then it'd be much more useful to be able to serialize boxes.
"Easy: I have a macro called "defs" that handles mutual recursion"
While I appreciate 'defs, it's a non-answer. The even/odd example I posted and the evenp/oddp example dido posted are idiomatic Arc code. While you and I don't care much about Arc compatibility, it's something dido wants for Arcueid, so these examples should work without modification.
I'm about to disagree with myself, but first I want to reiterate and clarify what I was saying at "caught me on a technicality":
For this discussion I don't see a much of a reason to distinguish between macros which insert mutable boxes and macros which insert functions. Either system can pretty much support the other as a special case: We can translate spliced boxes into spliced getter/setter functions, and we can translate spliced functions into spliced functions-in-the-box. Because of that equivalence, these systems share the disadvantage of being challenging to serialize.
If dido considers compilation to be important (do you, dido?), then this hygiene approach might be unsuitable, and thus the use of first-class namespaces might be unsuitable. (As I explained at "match the variables the macro author expected," first-class namespaces make hygiene more important.)
"What I'm talking about happens entirely at compile-time using boxes."
Ah. I think you have a point!
I seem to remember understanding this before, when you and I talked about Nulan compilation in depth, but I guess I had to retrace the steps just now.
Anyhow, get-variable-box is fantastic IMO, but first-class namespaces still might not be ideal for Arcueid due to Arc's unhygienic macros.
dido, are you comfortable with breaking existing Arc macro idioms in favor of hygiene?
I have a convoluted but surprisingly comprehensive idea of how to integrate get-variable-box into a system that's compatible with unhygienic Arc macros, but I've put it in a separate simultaneous post: http://arclanguage.org/item?id=17464
Actually, it's two separate posts, because it's otherwise too long for the forum. If this becomes a tl;dr scenario, I won't be surprised. ^_^
"While I appreciate 'defs, it's a non-answer. The even/odd example I posted and the evenp/oddp example dido posted are idiomatic Arc code. While you and I don't care much about Arc compatibility, it's something dido wants for Arcueid, so these examples should work without modification."
For this example, let's suppose there was a file "foo.arc" that contained idiomatic Arc code that implements evenp/oddp. This code works in Arc 3.1. It will work in my system as well, because undefined symbols automatically create new boxes. Basically, it'll work, but name collisions are possible, just like in Arc 3.1.
If you then write a new file "bar.arc" that uses hyper-static idioms (var, defs, etc.), it can import "foo.arc" and everything will work fine. "foo.arc" will clobber any existing evenp/oddp definitions, but "bar.arc" will not clobber "foo.arc". And of course "bar.arc" can use "w/include" and "w/exclude" to prevent "foo.arc" from clobbering things.
If you wanted to make it so that "foo.arc" behaves correctly without needing to use "w/include" and "w/exclude", you would indeed need to rewrite it to use "defs". But it's still usable even without a rewrite. So it's a perfectly graceful degradation.
My system is designed so that it can correctly use all existing Arc 3.1 code, while new code is written with the hyper-static idioms. Then, slowly, old code can be migrated to use hyper-static scope, until eventually you could make Arc purely hyper-static.
There's three issues I see with my proposal:
1) If you're writing Arc code in a hyper-static fashion, you really want "arc.arc" to be changed to be hyper-static. But old Arc code will need the non-hyper-static "arc.arc". I think the simplest solution to this is to have two versions of "arc.arc", one that uses hyper-static scope, and one that doesn't. Then you would need to make sure to load the non-hyper-static version before loading Arc 3.1 code. This could be automated a tiny bit by using a macro, something like "w/arc3".
2) "load" occurs at run-time, which is why my definition of "w/include" needed to use "eval". Nulan doesn't have this problem because file importing occurs at compile-time. Perhaps the best way to solve this is to keep "load" as-is, and add in a new "import" macro that does all its work at compile-time.
3) If you think (eventually) making Arc purely hyper-static is a bad thing, you won't like my proposal.
"Am I getting this right? This sounds very workable. :) And whaddayaknow, Nulan works. ^_-"
Yes, that's more or less correct. The one detail that's different is... Nulan doesn't have a "get-variable-box" function. The reason is because "quote" internally uses (the equivalent of) "get-variable-box". So in Nulan, rather than using "get-variable-box", you'd just use "quote". And if you want to break hygiene, you'd explicitly use the "sym" function.
I mostly followed along, but I don't understand "It will work in my system as well, because undefined symbols automatically create new boxes." You were talking about having them create new boxes at assignment time, and I was recommending compiling-a-reference-time instead so that we don't get an unbound variable error in the first definition.
How it works is, anytime the compiler sees an undefined symbol, it creates a new box for it like as if it had been created with "var".
Another way to think about it is... the compiler would replace this:
(= foo (fn () ... bar ...))
(= bar (fn () ... foo ...))
(= foo (fn () ... bar ...))
(= bar (fn () ... foo ...))
What happened is, when it encountered the undefined variable "bar", it created a new box for it. Then it encountered the undefined variable "foo", so it created a new box for it. Then it did the assignments like normal.
Given how you said "compiling-a-reference-time", I think we're talking about the same thing. Why did you mention assignment time?
Ah, sorry, huge miscommunication and misunderstanding on my part. I've actually been agreeing with you all along.
A large part of the problem is that I've been thinking about my proposal as two separate parts: one part deals with backwards compat with Arc, and the other part describes a hyper-static system for Arc.
When I was talking about "defs", I was talking about the hyper-static part. But you were talking about the backwards compat part. Hilarity (?) ensues.
that seems to describe the issues in more detail. However, I don't see how it directly solves the problems that the module system is attempting to solve. To adapt an example from the Pickaxe Book, we have implementations of the trigonometric functions like 'sin', 'cos', etc. in the system. Now, say I wanted to work on a simulation of good and evil, and define a function called 'sin' as well, inside a file called 'moral.arc'. Then you find you want to write a program to find out how many angels can dance on the head of a pin, and you need both the standard trigonometric functions, and my moral.arc. If you loaded moral.arc, without hyper-static scope my definition of 'sin' would stomp on the built-in definition of the trigonometric function, and you'd be unable to use both at the same time. With hyper-static scope, code that comes after (load "moral.arc") will still see only its definition of sin in the same way. However, you could write code before loading moral.arc that used the built-in definition of sin and it would be unaffected by its subsequent redefinition by moral.arc.
I do see from this, though, how a hyper-static scope could be used as the basis for the implementation of a module system (and in fact I may actually do so in Arcueid if it can really be done with full compatibility, of which I am not quite entirely convinced). It would be fairly straightforward to write my primitives as macros in an Arc with hyper-static scope.
What I'm more concerned about here is how modules are expressed, because while an underlying implementation can be easily enough changed, a poorly-designed convention for expressing and interfacing with modules might be problematic and once codebases start cropping up that use it, conversion can be hard.
"(and in fact I may actually do so in Arcueid if it can really be done with full compatibility, of which I am not quite entirely convinced)"
The system I have described is fully compatible, but it isn't "pure" hyper-static scope. It's a weird hybrid between hyper-static and dynamic, with the advantages/disadvantages of both. If you wanted to make a pure hyper-static system, like Nulan, you would have to give up Arc compatibility.
"What I'm more concerned about here is how modules are expressed, because while an underlying implementation can be easily enough changed, a poorly-designed convention for expressing and interfacing with modules might be problematic and once codebases start cropping up that use it, conversion can be hard."
I think my proposal is the best possible module system for Arc while retaining compatibility. If you gave up compatibility, it would be possible to design better systems. I am curious about what kind of conventions and interfaces you're talking about specifically, though.
I'm writing up a post giving more details on how to rename/include/exclude variables in a hyper-static system.
Okay, here's the remaining stuff. One thing we want to do is hide variables. To do this, we need a new primitive called "del". It works like this:
(var foo 1)
(def bar () foo)
Now, if you try to use "foo", it will throw an error saying "foo is undefined". But if you call (bar) it will correctly return 1. So, using "del" doesn't really remove the variable, it just hides it. This can be used to control which variables your library exports.
Another thing that would be really nice is a way to get at the actual box for a variable. To do this, we need a new function called "get-variable-box" that accepts a symbol and returns a box.
There's a few things you can do with this. One of them is to create a "w/exclude" macro which lets you hide the variables you specify:
We also want the ability to only import certain variables... but there's actually multiple ways to do this. You could have a primitive called "w/new-scope" that creates a new dynamic scope, similar to wrapping the expression in `(fn () ...)` except it works dynamically rather than lexically.
Another option would be a "w/new-namespace" primitive that returns an object that maps symbols to boxes. This is more flexible, but I'm not sure how fast it would be.
I'm going to go with the "w/new-scope" route, but if you have a better idea, I'm all ears. This is one part of Nulan that isn't quite fleshed out to my satisfaction yet. Also, I'm really not fond of using "eval" here.
"I think my proposal is the best possible module system for Arc while retaining compatibility. If you gave up compatibility, it would be possible to design better systems. I am curious about what kind of conventions and interfaces you're talking about specifically, though."
Well, you did read the linked post from the Arcueid blog I used to start this discussion didn't you? I was just thinking about a simple mechanism for qualifying free variables the way most other languages such as Ruby, OCaml, and various Scheme dialects (e.g. Scheme48, Bigloo, and Guile) have. I would think that the use of hyper-static scope could provide an implementation mechanism for this simple sort of module system I envision. I think the mechanisms you have in your follow-up are rather overly complex and don't even address the simplest use case I gave in my example. How would you use the mechanism you've described to allow code that includes both math.arc and moral.arc to let code below it use both the definitions in math.arc and moral.arc at the same time? My proposal would just have:
(module Math (def sin (x) ...)
... ; many other definitions
(module Moral (def sin (x) ...)
... ; many other definitions
These are obviously extensions to standard Arc, but they will not interfere with most plain vanilla Arc code (unless someone just happens to use the scope-resolution :: ssyntax I've chosen in their variable names too, which I doubt is likely).
"Well, you did read the linked post from the Arcueid blog I used to start this discussion didn't you?"
Yes. It sounds vastly more complicated than hyper-static scope. With more boilerplate too. I am quite aware of that style of module system.
"I think the mechanisms you have in your follow-up are rather overly complex [...]"
Your mechanism has some immediately obvious problems, like the fact that namespace names can collide, and the fact it has extra boilerplate required in every Arc file. It's also probably slower. Hyper-static scope has none of those problems.
"I would think that the use of hyper-static scope could provide an implementation mechanism for this simple sort of module system I envision."
It seems you're not quite grasping how hyper-static scope works and what it can do. Once you see it, I think you'll stop seeing hyper-static scope as being a mere implementation strategy (for a worse module system), and just use hyper-static scope by itself.
"[...] and don't even address the simplest use case I gave in my example. How would you use the mechanism you've described to allow code that includes both math.arc and moral.arc to let code below it use both the definitions in math.arc and moral.arc at the same time?"
"You can use a plain-old "var", or you can use something like "w/rename". Which parts do you see as complicated?"
Well, while your way is indeed not conceptually complicated, to my mind it creates complications in practical use. I don't think you fully realise just what the use of your proposed module system entails in an environment with lots of third-party libraries. You would need to do what amounts to a declaration of what symbols you want to use just after loading every library, placing a burden on every user of such library. You complain that my method requires boilerplate in every Arc file, but your method requires non-trivial renaming of every symbol that might conflict with another library to be loaded later after almost every library load, which I think is much worse than the two-symbol boilerplate my method requires. I don't see how that is any better, and to my mind that places a needless burden of bookkeeping onto the programmer.
Worse yet, forgetting to make such a declaration would result in a symbol getting a spurious binding in the top-level global environment, which might cause difficult to find bugs where a library defines a symbol that was used improperly in local code, so instead of getting an unbound symbol error one would get unexpected behaviour. What if, for instance, I wanted to use both moral.arc and math.arc at the same time, but didn't need the definition of sin in math.arc but needed other stuff it provided, but wanted to use the definition of sin in moral.arc. You'd say that I should just load math.arc before moral.arc but if I forgot about this and reversed the order of loading then my code might use the definition of sin in math.arc unexpectedly. And what would happen if a library happens to load another library that had naming conflicts with another library I'm using? My proposal doesn't suffer from this problem.
The purpose of a module system is to allow the programmer to control and manage name clashes, and your proposed mechanism, while I admit it can be used to accomplish the job, is much too low-level to be really useful for the kinds of use cases I envision.
True, the system I propose can have module names colliding, but it doesn't seem to be such a serious problem in actual practice for the other popular languages that make use of a similar system. And well, if I do wind up implementing hyper-static scope, that takes care of that uncommon case easily enough, and you only need to rename the module name.
Hyper-static scope is an interesting idea, and I may even actually implement your proposed hybrid version in Arcueid, but it is much too primitive on its own to provide a usable module system in my opinion.
"You would need to do what amounts to a declaration of what symbols you want to use just after loading every library, placing a burden on every user of such library."
No you don't. It depends on what the conflict is and what you're trying to accomplish. In the easiest case, there's no extra code needed. In the hardest case, you can use something like "w/prefix" which renames all the variables in the file:
... use Math::sin and Moral::sin ...
One problem with your proposal is that the library author decides what the prefix is. But then two different libraries can use the same prefix (imagine two libraries both using the "Math" module name).
Instead, in my system, it's the one who does the importing that decides what the prefix is. Because only the importer of the library has enough information to correctly resolve name collisions: the library author doesn't have enough information.
"What if, for instance, I wanted to use both moral.arc and math.arc at the same time, but didn't need the definition of sin in math.arc but needed other stuff it provided, but wanted to use the definition of sin in moral.arc. You'd say that I should just load math.arc before moral.arc but if I forgot about this and reversed the order of loading then my code might use the definition of sin in math.arc unexpectedly."
In that situation, your system would require you to use the module name as a prefix, increasing verbosity by quite a bit.
And no, I wouldn't say "load them in the right order". I'd say "load them in the right order, OR use w/include, OR use w/exclude, OR use w/rename, OR use w/prefix". You have many options in my system to resolve conflicts: use the one you like the best.
In that particular case, I'd probably just use w/exclude to exclude "sin" from the "math.arc" library. Much less verbosity than your system.
"The purpose of a module system is to allow the programmer to control and manage name clashes, and your proposed mechanism, while I admit it can be used to accomplish the job, is much too low-level to be really useful for the kinds of use cases I envision."
Requiring module prefixes in the case of conflict is an easier rule to follow and it has some benefits, but I wouldn't call it higher level.
"And what would happen if a library happens to load another library that had naming conflicts with another library I'm using? My proposal doesn't suffer from this problem."
Simple: you use w/exclude or w/rename or w/include or simply load them in the right order. Your proposal does have that problem because in the case of conflict, you now need to prefix the variable with the module's name, in other words, saying Math::sin rather than just sin.
With your proposal, you have boilerplate in every Arc file (the "module" form), and you have to use the module name's prefix in case of conflict.
With my system, in the very common case that there isn't any conflict, there's zero boilerplate.
And in the case where there is conflict, you can usually get by just fine by simply using w/exclude or w/include. And in the quite rare case where you need to use the same symbol from two libraries, you can either use w/rename or w/prefix.
Using w/prefix is about the same amount of boilerplate as your system, except that there's no possibility for namespace name clashes.
The benefit of hyper-static scope (aside from being simpler to understand and implement) is that you don't need to always use prefixes. There's a sliding scale, with "simply load them in the right order" being the most concise, and "w/prefix" being the most verbose.
With your system, you're stuck with full verbosity every time there's a conflict. With hyper-static scope, you have many options to resolve the conflict, with varying amounts of verbosity and control.
Your system might work out well in other languages, but I don't think it's well suited for Arc, a language that emphasizes simplicity, axioms, conciseness, and raw power.
P.S. The system I'm describing has some similarities to Factor's module system and Racket's module system, except because it's based on hyper-static scope, it's much simpler and easier to implement.
I don't know. The boilerplate that my proposal uses is also a form of documentation. So I know right away that the 'sin' function I'm using is from the Math module, or from the Moral module. If the boilerplate is too much in a given piece of code (e.g. you know that a large part of the code makes use of only the definitions in the Math module), then that's what import is for. Brevity is nice, but there is such a thing as too much brevity. Everything should be made as simple as possible, but not simpler.
I concede that your proposed system is much more general and flexible, so much so that my proposal could actually use it as a basis for its underlying implementation. Our disagreement seems to be more on actual convention and notation. You seem to feel it improper to suggest conventions for notation for a module system and place responsibility for this squarely on the users of third-party libraries. While you indeed propose many methods for accomplishing what a module system is supposed to accomplish, if I were to study someone else's code I'd need to know which method(s) were employed there. The well known Perl adage of there's more than one way to do it is a philosophy that not even Perl sticks to when it comes to organising libraries on CPAN. They have guidelines on the structure of libraries that are generally followed. Libraries that don't follow the guidelines generally see much less use because they cause trouble that users of such libraries need to work to get those libraries to play nice with the libraries that do follow the guidelines.
My real goal in proposing the design of a module system, as I state in the first paragraph of my original blog post, is to encourage the development of third-party libraries. I don't know that placing responsibility for the management of namespaces used by third party libraries entirely in the hands of the user of a library as you propose helps to further that goal.
With my system, if you want to you can just always use w/prefix and it will behave like your system. But you have the option to use less verbosity.
By the way, you keep mentioning how the "burden is placed on the user of the library", and that's exactly right. The library author cannot and should not be expected to predict everything that can happen. The user of the library is the only one with enough information.
You talked about a system using many third-party libraries. Let's look at how that plays out. Because there's so many libraries being used, there's a pretty good chance of conflict. In your system, that would mean that when a conflict occurs, you need to go in and change all the uses of the variable to use the prefix.
And what if you upgrade the library, or if you swap it out for another library? Now you either gotta add prefixes (in case there wasn't any), or you gotta change the prefixes.
And because doing this prefix change is a huge pain in the butt, you'd either encourage users to always prefix their variables (ala Python), or you'd need a smart IDE to do it for the user. Either way, your code ends up being a lot more verbose.
With your system, all conflict resolution happens inside the actual code itself. Which means when conflicts happen or change, you have to change your code.
With hyper-static scope, you just change the imports at the top of the file. The code itself stays the same. This is vastly less verbose and vastly more maintainable. It also opens up the possibility of something like RubyGems, with dependency information kept in a separate file.
You mentioned knowing "whether a variable is from the math module or not", but ironically, hyper-static scope handles that case wonderfully well. Because in hyper-static scope, all variables are resolved at compile-time to a unique box.
Which means that it's trivial to lookup which variables a module uses, and which module the variable was originally defined in. This could be a command-line utility, or it could be built into some IDE.
As an example of that, check out this IDE I designed for Nulan:
If you click on a variable, it will highlight it. Try entering this:
box foo = 1
box foo = 2
Now try clicking on the first "foo", then the third "foo". Basically, it knows exactly which variable is which, because of boxes. And this is really really easy to do.
Factor also has a wonderful integrated IDE that can do this (and more). I mention Factor because although Factor doesn't use hyper-static scope, its module system is quite similar to the system I'm proposing. The biggest difference is that in Factor, all conflicts must be explicitly resolved, whereas with hyper-static scope, some conflicts can be resolved simply by changing the load order. I don't think either style is really superior, more a matter of taste.
I do admit that requiring module prefixes is a way to add self-documentation without the use of an IDE or whatever. But that comes at the high cost of verbosity and flexibility. I personally don't think it's worth it.
If you want that kind of documentation, you're free to use w/prefix, or just use a comment at the top of the file. You might say, "but then users will be lazy and won't do it", and, well, yeah, because it's a pain in the butt. Arc doesn't strike me as the language to force users to do things they don't want to do.
And if there were a command-line utility that would tell you which variable belongs to which module, you could just autogenerate the documentation whenever you want, rather than having it hardcoded into the file.
"By the way, you keep mentioning how the 'burden is placed on the user of the library', and that's exactly right. The library author cannot and should not be expected to predict everything that can happen. The user of the library is the only one with enough information."
While I concede that this is true in general, that also does not mean that the author of a library should not be permitted to provide sensible defaults to allow someone to use the library with a minimum amount of fuss, and at the same time give the user the power to override these defaults when required.
"You talked about a system using many third-party libraries. Let's look at how that plays out. Because there's so many libraries being used, there's a pretty good chance of conflict. In your system, that would mean that when a conflict occurs, you need to go in and change all the uses of the variable to use the prefix."
In my system, every variable in a separate module in general has to have a prefix, just as Python does. That prefix is set by the library author but can be changed by the user of the library if required. You can dispense with the prefixes temporarily by using import in order to manage this verbosity.
You mentioned knowing 'whether a variable is from the math module or not', but ironically, hyper-static scope handles that case wonderfully well. Because in hyper-static scope, all variables are resolved at compile-time to a unique box.
Doesn't help. Your program knows, but you, the programmer, can't easily know this by mere inspection of the code or a snippet of code. You may even need to compare source files from different libraries in order to resolve this question fully under the system you propose, or use a special-purpose IDE or tools. I think that is much more important. With my proposed system, if you see Math::sin or (import Math ... (sin x) ...) then you'd know where it's coming from. An import form has only local effects that end at the closing parenthesis. Use of hyper-static scope in the way you propose on the other hand has unpredictable global effects that can be difficult to trace. I personally don't feel that it is too much of a high price to pay in verbosity and flexibility, and both of these can be ameliorated to a certain degree by using import.
Don't get me started on IDEs. If a language needs a special-purpose IDE in order to be usable, well, I consider that a very serious shortcoming. That's just another kind of forcing users to do things they don't want to do.
For whatever it's worth, I still technically disagree with the "all" here. I admit your approach handles the practical cases.
Your approach has developers hardcoding filenames within their source code. If a developer wants to use two files of the same name, they must find a way to segregate the files into subfolders, or they must rename a file and invade some library source code to rewrite the filename occurrences. Please let me know if I'm wrong about this. :)
The LtU post also goes over at least one use case where a library user wants to update the dependencies of the library without also updating the library itself. Fortunately, this time I think we can agree that this isn't the problem we're discussing. :) It's something I care about in a module system, but it's not directly related to name collision.
"Er, is it the other way around? It looks like at least this post of mine came after the email discussion"
Yes, but the discussion started with the Arc topic and then moved to e-mail and then moved back to the Arc topic.
"I admit your approach handles the practical cases."
Well then! I'll consider that "all", since I only care about the practical cases.
"Your approach has developers hardcoding filenames within their source code. If a developer wants to use two files of the same name, they must find a way to segregate the files into subfolders, or they must rename a file and invade some library source code to rewrite the filename occurrences. Please let me know if I'm wrong about this. :)"
Yes. It is tied to the filesystem, or website URLs, or Git commits, or whatever. But, the filesystem already enforces a "no two files with the same name in the same folder" rule, so no biggie.
If you wanted to create something that manages dependencies at a more abstract level, that's fine, and you can build it on top of my system, but I personally don't see much use for that (yet).
If your worry isn't about filenames at all, and is simply about putting filenames into the source code, I think the answer is really easy: just do something like RubyGems, where you have a standard file called "dependencies" that imports all the dependencies in the right order. And then when you want to load the library, you'd just load the "dependencies" file. This doesn't require any changes to my system, since it's purely user-convention.
This still allows for putting the dependency information straight into the source code, which is useful for quickie scripts and such. But big projects and libraries would use the "dependencies" convention. And as Ruby showed, this kind of user-convention can be applied after the language is already in use. So it doesn't need to be baked in ahead of time, though there might be some minor transition pain. But I'll worry about that once libraries and projects become big enough that a "dependencies" convention becomes useful.
P.S. Another strategy is to use boxes, which is what Nulan does. Using boxes has some extra benefits, including making macros completely hygienic, but I figured it would be easier for you to implement variable renaming.
Pauan, although I disagree with you on precisely how the hyper-static scope primitives should be used to create an actual module system, the idea seems to be general enough that it might actually be worth implementing in Arcueid in more or less the manner which you describe. The only problem I have is what happens with forward references, e.g. such as arise with mutual recursion. To go with the classic example of mutual recursion:
(def evenp (x) (if (is x 0) t (oddp (- x 1)))
(def oddp (x) (if (is x 0) nil (evenp (-x 1)))
Without a special operator, it seems impossible to define something like this in pure hyper-static scope. With the sort of hybrid hyper-static scope you envision, what would compiler do if it got this sort of definition?
The way I understand it, the compiler would create a box for oddp the way an actual binding using var would have, but this box would be empty as it were. If no subsequent definition of oddp followed, that would result in an unbound free variable error at runtime when evenp was used, exactly the way the reference Arc implementation would. If oddp were defined, however, it would then fill in the empty box that the reference to it in evenp created, and so the call to oddp would make use of the first definition of evenp that followed it, just as a non-hyper-static system would. Subsequent definitions of oddp using var would have no effect on evenp though, unless evenp were redefined.
One problem though is what happens when you try to redefine evenp in terms of a new oddp. Without having a way to remove the old bindings of evenp and oddp, this seems to be impossible. One of them will continue to use the old definition of the other no matter what order you define them.
Ordinary Arc wouldn't need such a macro, since the system I describe isn't purely hyper-static.
"The way I understand it, the compiler would create a box for oddp the way an actual binding using var would have, but this box would be empty as it were."
"One problem though is what happens when you try to redefine evenp in terms of a new oddp. Without having a way to remove the old bindings of evenp and oddp, this seems to be impossible. One of them will continue to use the old definition of the other no matter what order you define them."
You don't change the bindings. You mutate the box using "=". The box itself stays the same, it just has a different value at runtime.
So if you want to change oddp in such a way that evenp notices the changes, you would say this:
(= oddp ...)
And if you want to change oddp in such a way that evenp doesn't notice the changes, you would say this:
(var oddp ...)
In fact, that's how Nulan defines self-recursive and mutually-recursive functions. Using Arc syntax, this:
(def foo () ...)
Would get macroexpanded into this:
(= foo (fn () ...))
Notice that it first creates the box, and then assigns to it.
If I understand this correctly, it would allow Arcueid's compiler to resolve global variables at compile-time, rather than at run-time. No need for variable renaming. There is currently a genv instruction in Arcueid's virtual machine which basically takes a symbol referring to a global variable and finds its binding in the global environment. If we do hyper-static scope, the genv instruction can be changed to take a reference to the actual global variable instead. Thus, the mappings of symbols to free variables are only required when compiling an expression.