Arc Forum | Show Arc: Run Arc and Scheme in Emacs Lisp

Arc Forum

	Show Arc: Run Arc and Scheme in Emacs Lisp (github.com)
	5 points by shawn 2504 days ago \| 8 comments

4 points by shawn 2503 days ago | link

You may have noticed one of elisp's interesting features: closures are printed out.

  > (define (adder n)
      (lambda (x)
        (+ x n)))
  (closure (t) (n) (function (lambda (x) (+ x n))))
  > (adder 42)
  (closure ((n . 42) t) (x) (+ x n))

That means you can theoretically write and read closures to/from disk.

Anyone remember the "Unknown or expired link" errors in the old days of Hacker News? Arc generates closures on the fly, then stores them on the server keyed by a random ID. It sends this random ID down to the browser as links: https://www.laarc.io/x?fnid=ls1wNQuImEEYyvWtGKs1Sj. When the user visits the link, arc looks up the closure and calls it. This effectively allows Arc code to pause computation and then resume later. It's kind of like traditional node-style fs.readFile callbacks, but way more interesting because Arc uses macros to make it feel very natural to write server-side code in this style. You don't feel like you're writing nested callbacks. I've never caught myself wishing for async/await, for example.

Now, the drawback of this technique is that it starts to consume a lot of memory. The closures don't need to be stored forever, but they do need to be stored for a reasonable amount of time. Arc solves this by "harvesting" the closures, meaning it walks through the global closures table and casts out the old ones. Like a startup. (Hmm.)

The other drawback is that a server reboot will wipe all the closures. That's why you'd sometimes get "Unknown or expired link" in the middle of the day on HN. The memory usage got pretty extreme, and a periodic reboot was a quick fix (that no doubt made pg wince each time he ^C'd the server).

So, emacs lisp is very interesting here, because it illustrates that it ought to be possible to store Arc's dynamic closures on disk rather than in memory. That would solve the problem completely: you can generate as many functions as you want, and you don't need to worry about a thing till you blow through >10GB of disk space.

The best part is that the server should be able to reboot without losing the closures.

All the functions in news.arc operate either on objects in memory, like users, or on ephemeral objects, like requests. (The arc server creates a new `request` instance for each incoming connection, and it stores things like the user's cookies and the query arguments.) Both of these can already be serialized straight to disk. And the dynamic closures are almost always generated in the middle of building an HTML page -- i.e. you want to describe "build a form; here's what to do when the user submits it". That latter half is a function, which becomes a dynamic closure keyed by fnid. The form is sent straight down to the user's browser. When the form is submitted, the server picks up right where it left off. That means you can interweave your "view" code and your "model" code, in the MVC sense. And there's no need for a controller; the controller is the closure, which knows what to do thanks to lexical context.

HN eventually solved the fnid problem by getting rid of them. You'll rarely see any dynamic closures on the site. My hypothesis is (a) it's faster to write features using the fnid technique, and (b) the fns can be serialized to disk.

(a) seems true, and (b) is worth exploring. As https://www.laarc.io/ gains momentum, we're hoping to preserve this technique. If it works out, users shouldn't notice any "Unknown or expired link" messages – the closure will be on disk, and they'll last a long time, to put it mildly.

Tangentially, working on arcmacs flushed out a bug in emacs when reading EOFs from terminal: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=34123

-----

3 points by rocketnia 2503 days ago | link

"The best part is that the server should be able to reboot without losing the closures."

Might want to remember to wipe them out if you make certain changes to the code, so that you don't have to think about the effects of running old code and new code in the same system.

Pretty nifty that elisp has that built in. :)

-----

2 points by akkartik 2503 days ago | link

From http://www.arclanguage.org/tut.txt:

"There's one thing you can't do with functions that you can do with data types like symbols and strings: you can't print them out in a way that could be read back in. The reason is that the function could be a closure; displaying closures is a tricky problem."

I've sometimes wondered just what the connotation of 'tricky' is here. Is it hard in some theoretic sense, or just "Arc is a prototype so we don't have this yet"?

-----

3 points by rocketnia 2503 days ago | link

(Edit: Oops, I replied out of order and didn't read shawn's comment with the elisp examples before writing this.)

I suspect what pg means by it is primarily that it's tricky to do in Racket (though I'm not sure if it'd be because there are too few options or too many).

Essentially, I think it's easy to display a closure by displaying its source code and all the captured values of its source code's free variables. (Note that this might be cyclic since functions are often recursive.)

But there is something tricky about it, which is: What language is the source code in? In my opinion, a language design is at its most customizable when even its built-in syntaxes are indistinguishable from user-defined ones. So the ideal displayed format of a function would ideally involve some particular set of user-definable syntaxes. Since a language designer can't anticipate what user-defined syntaxes will exist, clearly this decision should ultimately be up to a user. But what mechanism does a user use to express this decision?

As a baseline, there's at least one straightforward choice the user can make: The one that expresses the code in terms of Arc special forms (`fn`, `assign`, `if`, `quote`, etc.) and Arc functions that aren't implemented in Arc (`car`, `apply`, etc.). In a language that isn't aiming for flawless customizability, this is enough.

Now suppose we try to generalize this so a user can choose any set of syntaxes to express things in terms of -- say, "I want the usual language, but with `=` treated as an additional built-in." If the code contains `(= x (rfn ...))`, then the macroexpander at some point needs to expand the `rfn` without expanding the `=`. That's not really viable in terms of Arc's macroexpander since we don't even know `(rfn ...)` is an expression in this context until we process `=`. So this isn't quite the right generalization; the right generalization is something trickier.

I suppose we can have every function printed in terms of its pre-macroexpansion source code along with printing all the macros and other macroexpansion-time state it happened to rely on as it macroexpanded the first time. That would narrow down the problem to how to print the built-in functions. And we could solve that problem in the language design by making it so nothing ever captures a built-in function as a first-class value, only as a late-bound reference to the global variable it's accessible from.

Or we could have the user specify their own macroexpander and make it so that whenever a function is printed, if the current macroexpander hasn't expanded that function yet, it does so now (just to determine how the function is printed, not how it behaves). This would let the user specify, for instance, that `assign` expands into `=` and `=` expands into itself, rather than the other way around.

These ideas are incomplete, and I think making them complete would be pretty tricky.

In Cene, I have a different take on this: If a function is printable (and not all are), then it's printable because it's a callable struct with a tag the printer knows about. It would be printed as a struct. The function implementation wouldn't be printed. (The user could look up the source code information based on the struct tag, but that's usually not printable.) There may be some exceptions at the REPL where information is printed that usually isn't available, because the REPL is essentially a debugging context and the debugger sees all. (Racket's struct inspectors express a similar debugger-sees-all principle, but I haven't seen the REPL take advantage of it.)

-----

2 points by shawn 2503 days ago | link

You're hitting on a problem I've been thinking about for years. There are a few reasons this is tricky, notably related to detecting whether something is a variable reference or a variable declaration.

  (%language arc
    (let in (instring "  foo")
      (%language scm
        (let-values (((a b c) (port-next-location in)))
          (%language el
            (with-current-buffer (generate-new-buffer "bar")
              (insert (prin1-to-string c))
              (current-buffer)))))))

To handle this example, you'll need to know whether each form is a function call, a variable definition, a list of definitions (let-values), a function call, and which target the function is being called for.

For example, an arc function call needs to expand into `(ar-apply foo ...)`

And due to syntax, you can't just handle all the cases by writing some hypothetical very-smart `ar-apply` function. If your arc compiler targets elisp, it's tempting to try something like this:

  (ar-apply let (ar-apply (ar-apply a (list 1))) ...

which can collapse nicely back down to

  (let ((a 1)) ...)

in other words, it's tempting to try to defer the "syntax concern" until after you've walked the code and expanded all the macros. Then you'd collapse the resulting expressions back down to the language-specific syntax.

But it quickly becomes apparent that this is a bad idea.

Another alternative is to have a "standard language" which all the nested languages must transpile tO:

  (%let in (%call instring "  foo")
    (%let (a b c) (%call port-next-location in)
      (|with-current-buffer| (%call generate-new-buffer "bar")
        (%call insert (prin1-to-string c)
          (%call current-buffer)))))

Now, that seems much better! You can take those expressions and easily spit out code for scheme, elisp, arc, or any other target. And from there it's just a matter of adding shims on each runtime.

The tricky case to note in the above example is with-current-buffer. It's an elisp macro, meaning it has to end up in functional position like (with-current-buffer ...) rather than something like (funcall #'with-current-buffer ...)

There are two ways to deal with this case. One is to hook into elisp's macroexpand function and expand the macros as you go. Emacs calls this eager macroexpansion, and there are some cases related to autoloading (I think?) that make this not necessarily a good idea.

The other way is to punt, and have the user indicate "this is an elisp form; do not mess with it."

The idea is that if the symbol in functional position is surrounded by pipe chars, then the compiler should leave its position alone but compile the arguments. So

   (|with-current-buffer| foo
     (prn (|buffer-string|)))

That works quite nicely, untll you try this:

  (|let| ((|a| 1) (|b| 2))
    (+ |a| |b|))

Then you'll be in for a nasty surprise: not only does it look visually awful and annoying to write, but it won't work at all, because it'll compile to something like this:

  (let (ar-funcall2 (a 1) (b 2))
    (ar-funcall2 _+ a b))

I am not sure it's possible to escape the "syntax concern". Emacs itself had to deal with it for user macros. And the solution is unfortunately to specify the syntax of every form explicitly:

https://www.gnu.org/software/emacs/manual/html_node/elisp/In...

  (def-edebug-spec let
       ((&rest
         &or symbolp (gate symbolp &optional form))
        body))

Ugh, augh, grawr. You can see how bad it would be to curse the user with having to do this for every macro they write.

Yet I am not sure it's possible to escape this fate. And it seems to work well in emacs.

Hopefully some of that might be helpful on your quest. The goal is worth pursuing.

-----

3 points by rocketnia 2502 days ago | link

I can't speak to elisp, but the way macro systems work in Arc and Racket, the code inside a macro call could mean something completely different depending on the macro. Some macros could quote it, compile it in their own way, etc. So any code occurring in a macro call generally can't be transformed without changing the meaning of the program. Trying to detect and process other macro calls inside there is unreliable.

I have ideas in mind for how macro systems can express "Okay, this macro call is over; everything beyond this point in the s-expression is an expression." But that doesn't help with Arc or Racket, whose macro systems aren't designed for that.

So something like your situation, where you need to walk the code before knowing which macroexpander to subject each part of it to, can't reliably treat the code as code. It's better to treat the code as a meaningless soup of symbols and parentheses (or even as a string). You can walk through the data and find things like `(%language ...)` and treat those as escape sequences.

(What elisp is doing there looks like custom escape sequences, which I think is ultimately a more concise way of doing things if new macro definitions are rare. It gets into a middle ground between having s-expression soup and having a macro system that's designed for letting code be walked like this.)

Processing the scope of variables is a little difficult, so my escape sequences would be a bit more verbose than your example. It's not like we can't take a Racket expression and infer its free variables, but we can only do that if we're ready to call the Racket macroexpander, which isn't part of the approach I'm describing.

(I heard elisp is lexically scoped these days. Is that right?)

This is how I'd modify the escape sequences. This way it's clear what variables are passing between languages:

  (%language arc ()
    (let in (instring "  foo")
      (%language scm ((in in))
        (let-values (((a b c) (port-next-location in)))
          (%language el ((c c))
            (with-current-buffer (generate-new-buffer "bar")
              (insert (prin1-to-string c))
              (current-buffer)))))))

Actually, instead of just (in in), I might also specify a named strategy for how to convert that value from an Arc value to a Racket value.

Anyhow, once we walk over this and process the expressions, we can wind up with generated code like this:

  ; block 1, Arc code
  (fn (__block2 __block3)
    (let in (instring "  foo")
      (__block2 __block3 in)))
  
  ; block 2, Scheme code
  (lambda (__block3 in)
    (let-values (((a b c) (port-next-location in)))
      (__block3 c)))
  
  ; block 3, elisp code
  (lambda (c)
    (with-current-buffer (generate-new-buffer "bar")
      (insert (prin1-to-string c))
      (current-buffer)))

We also collect enough metadata in the process that we can write harnesses to call these blocks at the right times with the right values.

This is a general-purpose technique that should help with any combination of languages. It doesn't matter if they run in the same address space or anything; that kind of detail only changes what options you have for value marshalling strategies.

I think there's a somewhat more convenient approach that might be possible between Arc and Racket, since their macroexpanders both run in the same process and can trade off with each other: We can have an Arc macro that expands its body as Racket code (essentially Anarki's `$`) and a Racket macro that expands its body as Arc code. But there are some difficulties designing the latter, particularly in terms of Racket's approach to hygiene and its local macros, two things the Arc macroexpander has zero concept of. When we switch from Racket to Arc and back to Racket, the hygiene information and local macro scopes will probably be obliterated.

In your arcmacs project, I guess you might also be able to have an Arc macro that expands its body as elisp code, an elisp macro that expands its body as Racket code, etc. :-p So maybe that's the approach you really want to take with `%language` and I'm off on a tangent with this "escape sequence" interpretation.

-----

4 points by shawn 2499 days ago | link

Oh yeah, I never posted an update. I finished porting HN to elisp. That project's readme should now be "This is a mostly-complete implementation of Paul Graham's Arc in Emacs Lisp."

It’s wildly hilarious to see HN running in emacs. It loads up all 500 laarc stories and user profiles in a few seconds, which is much lower overhead than I thought. Then you can run `(login-page “foo” ..)` and it spits out all the right HTML. If it weren’t so weird to do networking in elisp it could even become a real server. </hack>

If anyone's curious, here's an email I sent to a friend about this:

I finished porting HN to emacs. One advantage of the elisp runtime is that closures have printed representation. That means you can write them, read them, and evaluate after reading. Which implies serialization.

It also means you can deduplicate them. I notice that laarc has 20k fnids. When I deduped the fns* table on my local elisp version of HN, the size of the table went from 49 to 2.

From looking at the closures, I don’t think there is any reason they can’t be persisted to disk and run later. I can’t find any examples of fnids that capture lexical context which mutates before the fnid is called. The lack of mutation is a key point that should enable serialization... I think/hope.

The takeaway is that there doesn't seem to be a reason to get rid of the fnid system as laarc scales. I started doing that for a few endpoints, but it’s crazy how much work it is in comparison to spinning up an fnid. HN has to detect whether a request is a POST or GET, and respond differently. It's good to have that in general -- in fact, every web framework except arc has the ability to differentiate between POST vs GET. But it really speaks to the power of this technique that it's possible to do without!

I need to figure out the easiest thing to do to keep https://www.laarc.io memory usage down as we scale. HN's server specs are monstrous, and it’s a good reminder that cloud hosting is still limited if you absolutely cannot scale horizontally. Maybe colocation could be the way to go for us later on.

The site’s memory usage has been steadily creeping up — 76mb was the baseline in the early days, and now it’s >100 and <150. That’s a worrying sign. I can bump the droplet from 1gb to 2, and 2 to 4, but that won’t work forever.

I was going to say “This probably won’t be a problem for 6 months or so.” But that’s not true. If laarc gets picked up on techcrunch, I need to be able to handle an infinite amount of traffic. For sufficiently small values of infinity.

-----

2 points by akkartik 2499 days ago | link

Yeah, memory usage was a big reason I stopped trying to maintain servers in Arc (and went down my insane yak-shaving rabbithole of trying to reinvent the entire computing stack from scratch).

-----