Arc Forumnew | comments | leaders | submitlogin
2 points by shawn 2134 days ago | link | parent

You're hitting on a problem I've been thinking about for years. There are a few reasons this is tricky, notably related to detecting whether something is a variable reference or a variable declaration.

  (%language arc
    (let in (instring "  foo")
      (%language scm
        (let-values (((a b c) (port-next-location in)))
          (%language el
            (with-current-buffer (generate-new-buffer "bar")
              (insert (prin1-to-string c))
              (current-buffer)))))))

To handle this example, you'll need to know whether each form is a function call, a variable definition, a list of definitions (let-values), a function call, and which target the function is being called for.

For example, an arc function call needs to expand into `(ar-apply foo ...)`

And due to syntax, you can't just handle all the cases by writing some hypothetical very-smart `ar-apply` function. If your arc compiler targets elisp, it's tempting to try something like this:

  (ar-apply let (ar-apply (ar-apply a (list 1))) ...
which can collapse nicely back down to

  (let ((a 1)) ...)
in other words, it's tempting to try to defer the "syntax concern" until after you've walked the code and expanded all the macros. Then you'd collapse the resulting expressions back down to the language-specific syntax.

But it quickly becomes apparent that this is a bad idea.

Another alternative is to have a "standard language" which all the nested languages must transpile tO:

  (%let in (%call instring "  foo")
    (%let (a b c) (%call port-next-location in)
      (|with-current-buffer| (%call generate-new-buffer "bar")
        (%call insert (prin1-to-string c)
          (%call current-buffer)))))
Now, that seems much better! You can take those expressions and easily spit out code for scheme, elisp, arc, or any other target. And from there it's just a matter of adding shims on each runtime.

The tricky case to note in the above example is with-current-buffer. It's an elisp macro, meaning it has to end up in functional position like (with-current-buffer ...) rather than something like (funcall #'with-current-buffer ...)

There are two ways to deal with this case. One is to hook into elisp's macroexpand function and expand the macros as you go. Emacs calls this eager macroexpansion, and there are some cases related to autoloading (I think?) that make this not necessarily a good idea.

The other way is to punt, and have the user indicate "this is an elisp form; do not mess with it."

The idea is that if the symbol in functional position is surrounded by pipe chars, then the compiler should leave its position alone but compile the arguments. So

   (|with-current-buffer| foo
     (prn (|buffer-string|)))
That works quite nicely, untll you try this:

  (|let| ((|a| 1) (|b| 2))
    (+ |a| |b|))
Then you'll be in for a nasty surprise: not only does it look visually awful and annoying to write, but it won't work at all, because it'll compile to something like this:

  (let (ar-funcall2 (a 1) (b 2))
    (ar-funcall2 _+ a b))

I am not sure it's possible to escape the "syntax concern". Emacs itself had to deal with it for user macros. And the solution is unfortunately to specify the syntax of every form explicitly:

https://www.gnu.org/software/emacs/manual/html_node/elisp/In...

  (def-edebug-spec let
       ((&rest
         &or symbolp (gate symbolp &optional form))
        body))
Ugh, augh, grawr. You can see how bad it would be to curse the user with having to do this for every macro they write.

Yet I am not sure it's possible to escape this fate. And it seems to work well in emacs.

Hopefully some of that might be helpful on your quest. The goal is worth pursuing.



3 points by rocketnia 2134 days ago | link

I can't speak to elisp, but the way macro systems work in Arc and Racket, the code inside a macro call could mean something completely different depending on the macro. Some macros could quote it, compile it in their own way, etc. So any code occurring in a macro call generally can't be transformed without changing the meaning of the program. Trying to detect and process other macro calls inside there is unreliable.

I have ideas in mind for how macro systems can express "Okay, this macro call is over; everything beyond this point in the s-expression is an expression." But that doesn't help with Arc or Racket, whose macro systems aren't designed for that.

So something like your situation, where you need to walk the code before knowing which macroexpander to subject each part of it to, can't reliably treat the code as code. It's better to treat the code as a meaningless soup of symbols and parentheses (or even as a string). You can walk through the data and find things like `(%language ...)` and treat those as escape sequences.

(What elisp is doing there looks like custom escape sequences, which I think is ultimately a more concise way of doing things if new macro definitions are rare. It gets into a middle ground between having s-expression soup and having a macro system that's designed for letting code be walked like this.)

Processing the scope of variables is a little difficult, so my escape sequences would be a bit more verbose than your example. It's not like we can't take a Racket expression and infer its free variables, but we can only do that if we're ready to call the Racket macroexpander, which isn't part of the approach I'm describing.

(I heard elisp is lexically scoped these days. Is that right?)

This is how I'd modify the escape sequences. This way it's clear what variables are passing between languages:

  (%language arc ()
    (let in (instring "  foo")
      (%language scm ((in in))
        (let-values (((a b c) (port-next-location in)))
          (%language el ((c c))
            (with-current-buffer (generate-new-buffer "bar")
              (insert (prin1-to-string c))
              (current-buffer)))))))
Actually, instead of just (in in), I might also specify a named strategy for how to convert that value from an Arc value to a Racket value.

Anyhow, once we walk over this and process the expressions, we can wind up with generated code like this:

  ; block 1, Arc code
  (fn (__block2 __block3)
    (let in (instring "  foo")
      (__block2 __block3 in)))
  
  ; block 2, Scheme code
  (lambda (__block3 in)
    (let-values (((a b c) (port-next-location in)))
      (__block3 c)))
  
  ; block 3, elisp code
  (lambda (c)
    (with-current-buffer (generate-new-buffer "bar")
      (insert (prin1-to-string c))
      (current-buffer)))
We also collect enough metadata in the process that we can write harnesses to call these blocks at the right times with the right values.

This is a general-purpose technique that should help with any combination of languages. It doesn't matter if they run in the same address space or anything; that kind of detail only changes what options you have for value marshalling strategies.

I think there's a somewhat more convenient approach that might be possible between Arc and Racket, since their macroexpanders both run in the same process and can trade off with each other: We can have an Arc macro that expands its body as Racket code (essentially Anarki's `$`) and a Racket macro that expands its body as Arc code. But there are some difficulties designing the latter, particularly in terms of Racket's approach to hygiene and its local macros, two things the Arc macroexpander has zero concept of. When we switch from Racket to Arc and back to Racket, the hygiene information and local macro scopes will probably be obliterated.

In your arcmacs project, I guess you might also be able to have an Arc macro that expands its body as elisp code, an elisp macro that expands its body as Racket code, etc. :-p So maybe that's the approach you really want to take with `%language` and I'm off on a tangent with this "escape sequence" interpretation.

-----