Arc Forum | Compiling Nulan to JS.

Arc Forum

Compiling Nulan to JS.

2 points by Pauan 4515 days ago | 11 comments

I'm a bit stuck. Compiling Nulan to Racket is really easy and really awesome and really fast.

The same is not true of JS. Because with JS, the compiler has to output a JS string, unlike in Racket where the compiler outputs an AST.

The most significant issue I've found with this is that in Racket, I can take a function and splice it in directly. This gives me all kinds of nifty things like avoiding a variable lookup, partial information hiding, etc.

This is not true in JS, where everything that isn't a literal has to be given a name. And inlining a literal function is much too verbose. So I gotta use names for things.

Okay, fine, I figured out a naming scheme that I think will work well. But then I hit another snag...

The fundamental problem has to do with dynamicism vs speed. I see two major ways to compile Nulan to JS:

1) Evaluate every top-level form incrementally. In other words, this program...

  (var foo 1)
  (var bar 2)
  (+ foo bar)

...would first compile and eval (var foo 1), then (var bar 2), and then (+ foo bar). This is how the Racket version of Nulan works. It's also how Arc works.

This is great because there's no real distinction between compile-time and eval-time: macros can call functions, as long as those functions are defined before the macro is called.

The downside is that this means the compiler needs to actually be available at eval-time. It also means that the Nulan source code needs to be reparsed and recompiled on the fly every time, rather than being compiled once.

This is naturally going to be a lot slower than compiling it all at once.

2) Compile all the expressions at the same time. Meaning that this:

  (var foo 1)
  (var bar 2)
  (+ foo bar)

Would compile into something like this:

  var NULAN = (function (n) {
    n.foo = 1,
    n.bar = 2;
    n.foo + n.bar;
    return n
  })(NULAN || {})

This string could be stored in a file, or evaluated all at once. If stored in a file, it would avoid the overhead of calling "eval" at runtime. This strategy can achieve speeds just as fast as hand-written JS.

The problem with this strategy is that it forces a strict distinction between compile-time and eval-time: variables defined at compile-time (including macros) cannot use any variables defined at run-time. And run-time variables cannot use anything defined at compile-time.

This also means that all of Nulan's stuff would need to be macros, basically. With strategy #1, I can actually do stuff like this:

  o["prn"] = console.log;

And now using (prn 1 2) is equivalent to using console.log(1, 2);

But since everything is now macros, it would instead be like this:

  o["prn"] = new Macro(function () {
    return ["call", [".", "console", "log"], [].map.call(arguments, macex)];
  });

As you can see, this is clunkier. So, I'm torn. On the one hand, I really want the awesome speed of #2, but the flexibility of #1 is extremely useful, and I'm really used to programming in that way in Arc, which uses #1.

Maybe what I really want is some kind of compiler that will compile to byte-code, and then have a small runtime portion that actually evaluates the byte-code... that may provide better speed while still keeping the same flexible semantics.

1 point by Pauan 4515 days ago | link

In particular, if rocketnia has any advice on bytecode, since I know you've already done projects using it...

-----

2 points by Pauan 4515 days ago | link

I just realized something... Nulan's macro system won't work in JS. The reason is because... well, first, here's a macro:

  (mac foo -> {bar qux corge})

This macro will replace (foo) with (bar qux corge). The key thing to note is that it uses the values of bar/qux/corge, not the symbols. Therefore, Nulan's macro system is completely hygienic, just like Kernel.

That works just fantastic in Racket, where you can splice arbitrary things into the AST, but that won't work in JS, where things generally have to be given a name.

I could, of course, change it so that it expands to the symbols bar, qux, and corge, and then it would work in JS... but... ew, lack of hygiene. It just wouldn't be the same language anymore.

But then I realized something else... to explain this, I'll need to explain how the Nulan compiler works. Firstly, Nulan doesn't have much of a concept of "symbol": macros expand to values (not symbols), quote is discouraged, etc.

But Nulan takes it a step further. The Nulan compiler will actually remove all symbols from the AST before evaling it in Racket.

How this works is, at compile-time, the compiler has this big dictionary which represents the environment. This environment maps symbols to mutable/immutable boxes.

If the symbol refers to an immutable box, the compiler will unbox it and splice it in directly. If it's a mutable box, it'll put the box into the AST, which will be unboxed at runtime.

This is faaaaaaaast. There's no symbol lookup, not even for globals, thanks to Nulan's hyper-static scope.

Because of this, I'm pretty sure Nulan will end up being significantly faster than Arc. I don't think it'll be faster than Racket though, because Racket's modules are lexical, so I'm sure Racket can do the same sort of optimization. I do think it'll allow Nulan to be just as fast as Racket even though Nulan doesn't use Racket's modules.

Anyways, the point is, symbols are bound to boxes, not to values. And then these boxes are inserted into the AST. That way you can have the same symbol with different meaning:

  (var foo 1)
  (def bar -> foo + 5)

  (var foo 3)
  (bar)

The above first creates a new box and binds it to "foo". Within the function "bar", the compiler replaces the symbol "foo" with the box.

Then it creates a new box and binds it to "foo", but the compiler has already replaced the "foo" inside the function "bar" with a box, so there's no problem, and thus the call to (bar) returns 6, rather than 8.

---

Okay, that works faaaantastic in Racket, but JS doesn't have boxes, and you can't inline values into the AST anyways. So what do I do? Well, there's a little trick I'm using...

I think I'll be able to explain it easiest just by showing what the above compiles into in JS:

  var foo = 1;
  var bar = function () { return foo + 5 };

  var foo2 = 2;
  bar();

Notice that the second call to "var" created a "foo2" variable, not a "foo" variable. If you created a third "foo" variable, it'd be "foo3", etc.

Effectively, it's saving all the different "versions" of the foo variable, so that each variable is unique. And so, that's how I simulate Racket's boxes in JS.

This has another nifty side effect: on the JavaScript side, you can choose which version of the variable to call. So this gives the benefits of hyper-static scope not only to Nulan code, but also to JavaScript code that calls Nulan code.

This reminds me a lot of SSA:

http://en.wikipedia.org/wiki/Static_single_assignment_form

---

So, going back to the macro example way at the top. Consider this:

  (let bar 5 (foo))

Let's macro-expand it:

  (do (var bar 5)
      (bar qux corge))

In Arc, this would cause a name collision, oh noes, lack of hygiene! But what does Nulan compile it to?

  var bar2 = 5;
  bar(qux, corge);

Awesome, the macro remembered which version of "bar" it's using, and the variable "bar" in the "let" is replaced with "bar2". So there's no hygiene violation. With this system of dictionaries mapping symbols to symbols, I simulate boxes, and this makes Nulan-style macros hygienic, even in JS which doesn't let you insert values into the AST.

Anyways, with that out of the way, I actually think I'm going to go for strategy #2, because I only lose a tiny bit of flexibility, but it should be much much faster.

---

I'd still like to hear rocketnia's advice on bytecode, though, for future reference.

-----

2 points by rocketnia 4512 days ago | link

"That works just fantastic in Racket, where you can splice arbitrary things into the AST, but that won't work in JS, where things generally have to be given a name."

Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't.

---

"If the symbol refers to an immutable box, the compiler will unbox it and splice it in directly. If it's a mutable box, it'll put the box into the AST, which will be unboxed at runtime."

I did this in Penknife too, but Penknife only had mutable boxes.

Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?

---

"JS doesn't have boxes"

Sure it does! I usually use { val: _ } when I want a box.

You should probably keep using separate JavaScript variables, but here are your examples using boxes:

  // (var foo 1)
  (function () {
      env.foo = { val: 1 };
  })();
  
  // (def bar -> foo + 5)
  (function () {
      var g1 = env.foo;
      env.bar = { val: function () {
          return nulan.plus_( g1.val, 5 );
      } };
  })();
  
  // (var foo 3)
  (function () {
      env.foo = { val: 3 };
  })();
  
  // (bar)
  (function () {
      return bar();
  })();
  
  // (mac foo -> {bar qux corge})
  (function () {
      var g1 = env.bar;
      var g2 = env.qux;
      var g3 = env.corge;
      env.foo = { val: nulan.macro_( { bar: g1 }, function () {
          return [ g1.val, g2.val, g3.val ];
      } ) };
  })();
  
  // (let bar 5 (foo))
  // ==>
  // (do (var bar 5)
  //     (bar qux corge))
  (function () {
      var g1 = env.foo.val.captures.bar;
      var g2 = env.qux;
      var g3 = env.corge;
      return (function () {
          var bar = 5;
          return (0, g1.val)( g2.val, g3.val );
      })();
  })();

(The calls to nulan.macro_() and nulan.plus_() are just placeholders for whatever you would use there.)

Since I'm not using separate JavaScript variables for separate definitions, the original bar is totally out of scope in (let bar 5 (foo)), and I end up having to smuggle it in as part of foo (env.foo.val.captures.bar). I did this in Penknife, and I called the foo->bar path a "lexid." If I had to do it again, I might consider generating unique variables outside every environment, something like what you're doing.

For an easier solution all around, we can call the 'foo macro at run time, rather than relying on the compiler to expand it beforehand.

  // (let bar 5 (foo))
  // ==>
  // (do (var bar 5)
  //     (bar qux corge))
  nulan.eval_( env, [ "let", "bar", 5, [ "foo" ] ] );

This way, 'foo will return its own view of [ bar, qux, corge ], and we don't have to worry about bar being in scope.

As a bonus, this approach allows side effects during macroexpansion to work properly, rather than being forgotten by the compilation process.

The reason I didn't do this in Penknife is because Penknife macros suppressed parsing, the most expensive step, so this kind of compilation wouldn't have done any good.

---

"Anyways, with that out of the way, I actually think I'm going to go for strategy #2, because I only lose a tiny bit of flexibility, but it should be much much faster."

Just wait until someone comes along and thinks your language is too static. ;)

That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now.

---

"I'd still like to hear rocketnia's advice on bytecode, though, for future reference."

I've only done a few things with bytecode, most of which involve reverse engineering GB/NES/SNES machine code and game-specific binary data. I've taken a look at bytecode generation for the JVM a few times, but I know better than to trouble myself with that. :-p

Bytecode is a term commonly used to describe binary formats for programs that run on virtual machines. It's designed to be computationally cheap to execute (e.g. JIT-compile) on multiple CPU machine code dialects.

JavaScript is the only "machine" code you're targeting, so there's not much need to build a compatibility layer. However, you might be interested in fine-tuning for particular JS engines' strengths and weaknesses. For instance, supposedly V8 compiles JavaScript a lot like C code, so it might encourage large blocks of imperative code, few variables, and heavier use of structured control flow than function calls. (Eh, I almost never write C code, so what do I know?)

-----

1 point by Pauan 4512 days ago | link

"Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't."

Ewwww. Slow. And yes I know you only need to eval once per top-level form, but the problem is that means you need to actually keep the top level forms around and the compiler/runtime needs to always be present.

Keep in mind that I want to use Nulan to write Chrome Extensions. Now imagine a popup that occurs when the user presses a button. This popup is coded in Nulan. Now, every time the user presses the button, it has to re-parse, re-compile, and re-eval all of the Nulan code that is inside the popup.

I consider that unacceptable for my desires, where the code should be about as fast as handwritten JS. So, Nulan compiles things ahead of time to a JS file, to achieve those speeds.

---

"Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?"

Well, in Racket, I just use a "let" for local scope, so no, locals aren't boxes. But I could make locals use boxes too. It would just be a lot harder to implement. At that point I might as well just write an interpreter.

And in fact, the interpreted vau version of Nulan does use boxes for all variables, global and local.

---

"Sure it does! I usually use { val: _ } when I want a box."

Yes, but then it has to do the boxing/unboxing at runtime in addition to the variable lookup. Whereas with Nulan's scheme, you get almost all the same benefits, but with just the variable lookup. The only downside is that the boxes are no longer available at runtime (but you can access the fake "boxes" at compile-time).

---

"Just wait until someone comes along and thinks your language is too static. ;)

That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now."

Yes it is a tradeoff. Yes I know it's more static than Arc or the Racket version of Nulan. But with my scheme, I can actually make things feel fairly fluid. As an example, this works:

  (var + (fn ...))

  (+ 1 2)

That is, the runtime function + shadows the macro +, so it doesn't do the macro expansion. And macros can hygienically expand to runtime values, as I already explained, thanks to the variable renaming scheme.

So, the only real downside is that you can't do this:

  (def bar -> ...)

  (mac foo ->
    (bar ...))

That is, you can't actually call a runtime function from inside the macro. But you can still expand to it:

  (mac foo ->
    {bar ...})

And you can also evaluate the "bar" function at compile-time, which would make it available:

  (&eval (def bar -> ...))

  (mac foo ->
    (bar ...))

I consider this to be a perfectly reasonable downside in exchange for increased speed.

-----

1 point by Pauan 4512 days ago | link

Your alternate code looks pretty straightforward, except for these two bits:

  env.foo = { val: nulan.macro_( { bar: g1 }, function () {
      return [ g1.val, g2.val, g3.val ];
  } ) };

  return (function () {
      var bar = 5;
      return (0, g1.val)( g2.val, g3.val );
  })();

Why does it pass { bar: g1 } as the first argument to nulan.macro_?

Why do you do (0, g1.val)? That's the same as just using g1.val.

-----

1 point by rocketnia 4512 days ago | link

"Why does it pass { bar: g1 } as the first argument to nulan.macro_?"

That object becomes the "captures" property in env.foo.val.captures.bar later. In Penknife, that's how I let code have run time access to the lexical scopes of compile-time-expanded macros.

---

"Why do you do (0, g1.val)? That's the same as just using g1.val."

I like to go out of my way not to pass a "this" parameter unless I actually want to pass it. If I were hand-writing this code, I would have probably used a fresh local variable to hold g1.val, but this this (0, g1.val)( ... ) approach is sufficient for compiler output.

-----

1 point by Pauan 4511 days ago | link

You learn something new every day:

  var o = {
    foo: function () {
      return this
    }
  }

  o.foo()       -> Object
  (0, o.foo)()  -> Window

I would have expected JS to be smart enough to return the object in both cases.

-----

2 points by rocketnia 4511 days ago | link

Well, this behavior is the one that surprised me initially:

  o.foo()       -> Object

I expected the property access and the procedure call to be two completely separate steps. Instead, there's a secret avenue that somehow exposes my local variable "o" to the callee without me explicitly passing it in!

After this encounter, I reasoned that a.b( ... ) must be its own syntax, only superficially related to a.b and b( ... ). My choice to use this syntax is explicit. This intuition worked well for a while, but then this behavior surprised me:

  (o.foo)()     -> Object
  ((o.foo))()   -> Object

So it's its own syntax, but it also supports grouping parentheses? What for?

Turns out the spec actually specifies this behavior in terms of "References." An expression doesn't just return a value; it returns a Reference that can be either resolved to a value, assigned to, deleted, passed to typeof, or called as a method call. (Maybe there are some other options too.)

In the spec's own words:

"The Reference type is used to explain the behaviour of such operators as delete, typeof, and the assignment operators. For example, the left-hand operand of an assignment is expected to produce a reference. The behaviour of assignment could, instead, be explained entirely in terms of a case analysis on the syntactic form of the left-hand operand of an assignment operator, but for one difficulty: function calls are permitted to return references. This possibility is admitted purely for the sake of host objects. No built-in ECMAScript function defined by this specification returns a reference and there is no provision for a user-defined function to return a reference. (Another reason not to use a syntactic case analysis is that it would be lengthy and awkward, affecting many parts of the specification.)"

An ECMAScript Reference is similar to what I independently called a "fork" in Penknife. Forks were my approach to unifying Arc-style metafns, setforms, and macros into a single syntactic mechanism. A Penknife syntactic abstraction would return a fork value, and that value would have behaviors for parsing a macro call body, for setting, for getting, and for acting as a top-level command. The result of the parsing behavior would be another fork, so this branching process could handle multiple layers of brackets, metafn style. (Now I understand this as a monadic, coinductive approach. Yay big words.)

I've compared JavaScript's method invocation to metafns before: http://arclanguage.org/item?id=12094. I was somewhat supportive of the combination in that post, but I think Arc, JavaScript, and Penknife just use this technology for name punning, rather than resolving any essential conundrums of language design, so I'm not very enthusiastic about the technique in general.

P.S.: I've also casually explained this part of JavaScript's method invocation to you before: http://arclanguage.org/item?id=14665 But it was technically a different example, so no worries. ;)

-----

1 point by Pauan 4510 days ago | link

Yes, the whole "this" system in JS is pretty insane. Naturally Nulan solves this by requiring you to explicitly pass in the object. This also gives you the "call" and "apply" behavior of JS for free, in an orthogonal way.

P.S. The JS version of Nulan is coming along quite nicely. 25 working macros and 10 semi-working macros. It can handle a ton of stuff already, but it still has a long way to go!

-----

1 point by akkartik 4515 days ago | link

There might be a lesson here about why racket is an awesome design. I'd love to hear it once you figure things out.

-----

3 points by Pauan 4515 days ago | link

I've found that all of the things I really like about Racket are simply because it's a Lisp: it would be just as easy in Arc. I bet it'd also be just as easy in other Schemes or Common Lisp.

The primary benefit of Racket over JS is that: Racket represents programs as lists. These lists can have arbitrary stuff put inside them (like actual functions). Racket is highly optimized and executes code very fast. Racket also optimizes common functional-programming patterns, like the "let" macro in Arc.

But that isn't really tied specifically to Racket. As for Racket-specific stuff... I guess the fact it has nice immutable hash tables and boxes built-in? That's about the only nice Racket-specific thing I've found so far.

So I'm not really saying "Racket is awesome", I'm really saying "JS sucks, and I wish I could eval a JS AST rather than a JS string". But yes, it is cool that Racket is a Lisp, and so it makes code compilation easier than non-Lisps.

-----