Arc Forumnew | comments | leaders | submitlogin
New version of arc2js! (github.com)
4 points by Pauan 4803 days ago | 2 comments


3 points by Pauan 4803 days ago | link

It's been 15 days since I've pushed any changes to arc2js, so I'll describe some of the major changes.

For starters, I basically threw out the old compiler (which was single-pass) and started over by writing a multi-pass compiler. Before you start cringing and throwing tomatoes at me, I only rewrote the compiler, which means that most of my old work is still there, just with some minor tweaks.

This new compiler is much better than the old one. It's roughly comparable in number of lines, but it was much easier to implement and is much easier to reason about and understand, and it supports some really nifty features that the old compiler basically would have barfed on.

Probably the biggest change is optimizations. I've put quite a bit of work into optimizing common Arc idioms into super-efficient JS code. I figure if I can optimize basic things like "let" and "do" well enough, it'll be fast, even if I ignore optimizations in less-commonly-used things.

My old compiler optimized "let" blocks, but only in very specific situations: it had to be the only expression in the function's body, and it had to be in statement position. My new compiler optimizes "let"s almost everywhere, even in expression position and at the global scope.

Here's an example:

  (fn ()
    (foo (let a 10 a)))
Let's compile it:

  function () {
      var a = 10;
      return foo(a)
  };
  
Woah, neat, it compiled into a "var" statement. But what about this...?

  (fn ()
    (if a (let a 10 a)))
    
It will compile into this:

  function () {
      var b;
      return a && (b = 10, b)
  };
  
Huh? What's going on here? There are multiple phases in my compiler. One is the "shorten" phase. That phase's job is to take local variable names and replace them with gensyms. Later on, those gensyms can be replaced with an identifier like "a" or "b", etc.

This, combined with static free variable analysis enables me to safely optimize things like "let" blocks into "var" statements, because of variable renaming.

Another phase is the "optimize" phase, which does what it says it does. But the way it does it is pretty cool. Most of the phases deal with plain old Arc lists, they don't deal with JS code at all. What basically happens is, when the optimize phase sees something like this:

  (foo (let a 10 a))
  
It will rewrite it to this:

  (let a 10
    (foo a))
    
And now because the "let" is on the outside, it's in statement position, which means my compiler can rewrite it to a "var" statement. That's fine and dandy if the "let" is the first argument, but what about this?

  (if a (let a 10 a))
  
It'll be rewritten to this:

  (let b nil
    (if a (do (= b 10) b)))
    
We still move the "let" to the outside, but now we initialize it to nil and do the assignment inline. This works no matter how nested the "let"s are. Let's consider a more complex example. This time, we're gonna generate some DOM nodes using the dom.arc library:

  (div foo "bar"
    (div bar "qux"
      (div qux "corge"
        (div corge "foobar")))
    (div yes "no"))
    
First, let's turn off optimizations and see what the above compiles into:

  (function (a) {
      a.setAttribute("foo", "bar");
      a.appendChild((function (a) {
          a.setAttribute("bar", "qux");
          a.appendChild((function (a) {
              a.setAttribute("qux", "corge");
              a.appendChild((function (a) {
                  a.setAttribute("corge", "foobar");
                  return a
              })(document.createElement("div")));
              return a
          })(document.createElement("div")));
          return a
      })(document.createElement("div")));
      a.appendChild((function (a) {
          a.setAttribute("yes", "no");
          return a
      })(document.createElement("div")));
      return a
  })(document.createElement("div"));
  
Okay, so we can clearly see the nested structure caused by the "let" blocks (which is pretty cool), but we can do better. Let's turn on optimizations:

  (function (a) {
      a.setAttribute("foo", "bar");
      var b = document.createElement("div"),
          c,
          d;
      b.setAttribute("bar", "qux");
      c = document.createElement("div");
      c.setAttribute("qux", "corge");
      d = document.createElement("div");
      d.setAttribute("corge", "foobar");
      c.appendChild(d);
      b.appendChild(c);
      a.appendChild(b);
      var e = document.createElement("div");
      e.setAttribute("yes", "no");
      a.appendChild(e);
      return a
  })(document.createElement("div"));
  
Woah! Much better. Notice how it took our nested (and inefficient) code and optimized it into flat super-efficient code using "var" statements. This lets you use Arc idioms, and know that it'll be expanded into fast JS code.

Now, I said my compiler optimizes "let"s at the global scope. Let me give you an example of what I'm talking about. First, the Arc code:

  (do (let a 5 a)
      (let a 10 a)
      (let a 15 a))
      
Let's compile it without optimizations:

  (function (a) {
      return a
  })(5);
  
  (function (a) {
      return a
  })(10);
  
  (function (a) {
      return a
  })(15);
  
Okay, so it creates one function per "let" block, perfectly normal. Let's turn on optimizations now:

  (function (a, b, c) {
      a;
      b = 10;
      b;
      c = 15;
      return c
  })(5);
Huh! Would you look at that. Now it only created a single function, with the 2nd and 3rd "let" blocks being ordinary variables inside of it. What's especially cool about this is that it's transforming Arc code into Arc code, like a macro.

Thus, theoretically, an Arc compiler like ar could make these same optimizations, flattening out nested "let" blocks. In practice, though, Racket probably does optimizations of it's own, so that probably wouldn't help much. But it's definitely very useful for an Arc to JS compiler like arc2js.

As a final example, let's consider the "each" macro. In JS, the idiom for looping is the "for" loop:

  for (var i = 0; i < foo.length; i++) {
      ... foo[i] ...
  }
  
You end up using the humble "for" loop a lot. In Arc, however, you can just do this:

  (each x foo
    ... x ...)
    
What does the above compile into...?

  (function (a) {
      var b = 0,
          c;
      while (b < a) {
          c = foo[b];
          ...
          c;
          ...
          ++b
      }
  })(foo.length);
  
Well, dang. Look at all that boilerplate! But you don't need to worry about that, you can just use "each" and rest assured it will result in super-fast code.

At this point I consider arc2js suitable for production use (i.e. I would use it for my own code).

-----

2 points by Pauan 4803 days ago | link

Oh yeah, by the way, I've split arc2js into two parts: arc2js.arc and compiler.arc

Most of compiler.arc contains stuff that isn't specific to JS, and then arc2js.arc adds in the JS-specific stuff. Thus, it should hopefully be pretty easy to take compiler.arc and then write, say, arc2python, without needing to change too much.

So, all those optimizations I talked about are in compiler.arc, which means they would work with a hypothetical arc2python as well.

-----