Arc Forumnew | comments | leaders | submitlogin
Amacx: A compiler for Arc and Arc inspired languages (
5 points by aw 1067 days ago | 12 comments

3 points by aw 1063 days ago | link


- I'd like to be able to use the Racket profiler.

- For the profiler to be useful, functions need to be labeled in some way so that the profiler will show which functions are which.

- To identify functions, I need to propagate source location information from an Arc source code file through to Racket, which includes propagating the source location information through Arc macros.

One option would be to create a macro system designed to propagate source location information through macros. This of course is what Racket does.

In Arc, macros are defined with list primitives (car, cdr, cons), function calls (for example, `(mac let (var val . body) ...)`), and operations that can be built out of those (such as quasiquotation).

My hypothesis is that by allowing list primitives to operate on forms labeled with source location information, we can continue to use Arc macros.

This leads to an interesting issue however...


    (apply (fn args args) xs)
this is an identity operation. For any list `xs`, this returns the same list.

In Arc 3.2, and in my implementation, an Arc function compiles into a Racket function. E.g., `(fn args args)` becomes a Racket `(lambda args args)`.

Of course we don't have to do that. If we were writing an interpreter, for example, an Arc function would compile into some function object that'd be interpreted by the host language... an Arc function wouldn't turn into something that could be called as a Racket function directly.

But, if `(fn args args)` is implemented as a Racket function `(lambda args args)`, then to call the function with some list `xs` we need to use Racket's apply. But, of course, Racket's apply takes a Racket list. So in Arc 3.2, Arc's apply calls Racket's apply after translating the Arc list into a Racket list.

Leaving out a couple of steps, what in essence we end up with in Racket is the equivalent of:

    (apply (lambda args args) (ar-nil-terminate xs))
where `ar-nil-terminate` converts an Arc list to a Racket list.

Now, for Amacx, I've invented my own representation for Arc lists. In my version, lists (that is, cons cells) can be labeled with where in a source code file they originated from. For example, if I read "(a b c)" from a file, I can inspect that list for source location information:

    > (prn x)
    (a b c)
    > (dump-srcloc x)
    foo.arc:1.0 (span 7) (a b c)
      foo.arc:1.1 (span 1) a
      foo.arc:1.3 (span 1) b
      foo.arc:1.5 (span 1) c
which shows me that the list in `x` came from a source file "foo.arc" at line 1, column 0 with a span of 7 characters; that the first element "a" was at column 1, "b" was at column 3, and so on.

This is entirely internally consistent. E.g. (cdr x) returns a value which contains both the tail of the list `(b c)` and the source location information for the sublist.


In a macro,

    (mac let (var val . body)
      `(with (,var ,val) ,@body))
I'm not seeing the source location get through the rest args.

    (mac let (var val . body)
      (dump-srcloc body)
      `(with (,var ,val) ,@body))
Zilch. Nothing. Nada. `body` is a plain list, no source location information.


Because I stripped it.

    (apply (lambda args args) (ar-nil-terminate xs))
The argument to Racket's `apply` has to be a Racket list. Not my own made-up representation for lists.

    > (apply (lambda args args) xs)
    ; apply: contract violation
    ;   expected: list?
Thus my version of `ar-nil-terminate` removes source location information and returns a plain Racket list. I did this early on, because loading Arc fails quite quickly when `apply` doesn't work. I didn't realize it would mean that macros wouldn't get source location information passed to them.

So, a macro like `let` turns into the equivalent of

    (annotate 'mac
      (lambda (var val . body)
the macro is invoked with `apply`... and there goes the source location information in `body`.

Of course, like I said, I don't have to implement an Arc rest argument with a Racket rest argument. An Arc function that took a rest argument could turn into some other kind of object where I'd pass in the rest argument myself.

But that would be slower. Probably.

I can get the profiler to work (I think), but then I'd be profiling the slower version of the code.

Though the runtime that implements the extended form of lists with source location information is slower anyway because all of Arc's builtins need to unwrap their arguments.

So maybe this is just the next step.


3 points by rocketnia 1062 days ago | link

That does seem like a conundrum.

Where do you actually need source location information in order to get Arc function names to show up in the profiler?

Would it be okay to track it just on symbols, bypassing all this list conversion and almost all of the Arc built-ins' unwrapping steps (since not many operation have to look "inside" a symbol)?

If you do need it on cons cells, do you really need it directly on the tail cons cells of a macro body? I'd expect it to be most useful on the cons cells in functional position. If you don't need it on the tails, then it's no problem when the `apply` strips it.

Oh, you know what? How about this: In `load`, use `read-syntax`, extract the line number from that syntax value, and then use `syntax->datum` and expand like usual. While compiling that expression, turn `fn` into (let ([fn-150 (lambda ...)]) fn-150) or (procedure-rename (lambda ...) 'fn-150), replacing "150" here with whatever the source line number is. Then the `object-name` for the function will be "fn-150" and I bet it'll appear in the profiling data that way, which would at least give you the line number to work with.

If you want, and if that works, you can probably have `load` do a little bit of inspection to see if the expression is of the form (mac foo ...) or (def foo ...), which could let you create a more informative function name like `foo-150`.

There's something related to this in `ac-set1`, which generates (let ([zz ...]) zz) so that at least certain things in Arc are treated as being named "zz". Next to it is the comment "name is to cause fns to have their arc names while debugging," so "zz" was probably the Arc variable name at some point.


3 points by aw 1062 days ago | link

> Would it be okay to track it just on symbols

mmm, not sure. It'd probably be easier to start with a working version (even if slow) and then remove source information from lists and see if anything breaks.

> In `load`, use `read-syntax`, extract the line number from that syntax value

erm, so all functions forms compiled during the eval of that expression would get named "fn-150"?


3 points by rocketnia 1062 days ago | link

"erm, so all functions forms compiled during the eval of that expression would get named "fn-150"?"

That's what I mean, yeah. Maybe you could name them with their source code if you need to know which one it is, if it'll print names that wide. :-p This isn't any kind of long-term aspiration, just an idea to get you the information you need.


3 points by aw 1064 days ago | link

I’m currently working to see if I can get profiling going...

At the moment, Amacx loads arc.arc unpleasantly slowly. And I don’t even have all of arc.arc included yet.

I imagine it’s probably because the compiler is slow. (What else could it be?)

And I imagine there’s a good chance the compiler is slow because as the compiler recursively works its way down into forms being compiled, I functionally extend the compilation context in a simplistic way:

    (def functional-extend (g nk nv)
      (fn (k)
        (if (is k nk) nv (g k))))
But it would be nice to be able to measure it, instead of just guessing.

I hope I can use the Racket profiler, if I can propagate function names and/or function source locations from Arc to Racket.

At the moment a typical profile trace looks like:

                                    for-loop [72]                                          5.1%
                                    ??? [1]                                               38.9%
                                    ??? [12]                                              56.0%
     [12] 1456(30.9%)  104(2.2%)  ??? /Users/andrew/w/amacx/arc/runtime.rkt:409:9
                                    ??? [12]                                              56.0%
                                    ??? [1]                                               42.2%
i.e. the profiler knows that functions are being called, but has no way to identify them because all such identifying information has been lost by the time Arc code (e.g. implementing the compiler) has been compiled to Racket.

Today I ran into a weird issue. I was running the Racket profiler in DrRacket, and it worked at first, but then started to hang. So I closed DrRacket and restarted it, deleted all my `compiled` directories, reverted my source code back to an earlier version… and it still hung.

So I don’t know why it worked at first or why it stopped working.

But it turns out the profiler works well from the command line, so I’m doing that instead.


2 points by aw 1059 days ago | link

It's interesting to be taking an axiomatic approach. That is, in this case, to add to the language axioms that expressions can be labeled with their source file locations.

It might not work: it might turn out that the feature I want (to be able to track source locations through macro expansions) can't be expressed in terms of this particular set of axioms. Or, it might be that it can, but the result is a runtime too slow for me to want to use it.

But, if it does work, it has its own internal logic. What does (cdr x) mean when x has been labeled with source locations? Well, clearly, what it ought to mean is the tail of x, labeled with the source locations of the tail of x. Theorems such as (apply (fn args args) xs) ≡ xs should continue to work.

On the other end of the spectrum from an axiomatic approach is engineering. Have a list of features you want, and design a system that implements all of them. This too might fail sometimes (perhaps the features you want turn out to be incompatible, or you design yourself into a corner that's hard to get out of)... but most of the time it's more reliable, in the sense that usually we can come up with some design that implements all (or at least most!) of the features we want... even if maybe the result isn't very pretty.

The downside of engineering is design complexity. Complexity will probably at least scale linearly with the number of features, if not more likely by some power law. If we're lucky we may see some simplifications in the design along the way that we can refactor into, some axioms of the design that become apparent that we can incorporate... but most of the time, in practice, the design gets more and more complex as we add features.

Engineering is attractive because it gets things done. "I just want X, let's implement X". There are a lot of times when what I want is just to implement X, and I engineer a design, and it works out fine.

The axiomatic approach is more uncertain. Will it work? I don't know. It's also harder. Oops, ssyntax stopped working. Why? `some` stopped working. Why? `recstring` stopped working. Why? `+` stopped working. Why? Is it because my implementation of `apply` is broken, or because I broke the compiler and its now outputting broken code, or because my runtime is broken? It could be any of these. Another day, another week of debugging.

It's also more fun. There are many macro systems. Many of them are practical. Some have features I don't care about, some are more complicated than I like, but I don't have much interest myself in engineering yet another macro system. Axioms are more interesting. Perhaps it will turn out that for these particular set of axioms, it doesn't work out for this particular feature. But then at least I know why :-)


1 point by akkartik 1059 days ago | link

You've got me curious now about how this relates to Amacx :)

I find that having tests allows me to start out in a sort of engineering mindset, in your terms, where I just get individual cases working one by one. But at the same time they keep me from growing too attached to a single implementation and leave me loose to try to think up more axiomatic generalizations over time. You can kinda see that in if you squint a little; I don't show my various working copies there, but the case analysis I describe does faithfully show how I started out thinking about the problem, before the clean general solution suddenly fell out in a flash of insight.

(Having tests doesn't preclude more systematic thinking about the space, and proving to myself that a program is correct. But even if I've proved a program correct to myself I still want to retain the tests. My proofs have been shown up too often in the past ^_^)


2 points by aw 1058 days ago | link

> You've got me curious now about how this relates to Amacx :)

Why, everything! :-) E.g. I start with: what if top level variables were implemented by an Arc table, and top level variable references were implemented by an Arc macro? That is, what if top level variables were built out of lower level language axioms, instead of being built in?

We end up with something that's kind of like modules, but doesn't do everything that we'd typically expect modules to do (though perhaps we could implement modules on top of them if we wanted to), and also does some things that modules don't do (for example we can load code into a different environment where the language primitive act differently).

To give a name to this thing that is kind of like modules but different, I called them "containers", because they're something you load code into.

Are containers useful? Well, I'm guessing it would depend on whether we'd ever want to want load code into different environments in our program. If we only want to load code once, and all we want is a module system, I imagine it'd probably be more straightforward to just implement a module system directly.

On the other hand, suppose we have a runtime that gives us some nifty features, but is slower than plain Arc. Now it seems like containers could turn out to be a useful idea. Perhaps I have some code that I want to load in plain Arc where it'll run fast, and other code that I want to run in the enhanced runtime and I don't mind that it's slower.

> I find that having tests allows me to start out in a sort of engineering mindset, in your terms, where I just get individual cases working one by one. But at the same time they keep me from growing too attached to a single implementation and leave me loose to try to think up more axiomatic generalizations over time.


This is the classic test driven development refactoring cycle: add features with tests, then refactor e.g. to remove duplicate code, and/or otherwise refactor to make the code more axiomatic.

Since "Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp", one could, in theory, start with such a C or Fortran program and refactor towards an axiomatic approach until you had reinvented Lisp, with the program written in Lisp :-)

But in practice I think going the other way is sometimes necessary: that is, starting with some axioms, and seeing what can be implemented out of them.

I'm not sure why that is (why doesn't anyone keep refactoring a large C program and end up with Lisp?) but I suppose it might be because it's too cognitively difficult, or you end up at some kind of local maximum in your design, or something.

In any case, I find tests absolutely essential for working on Amacx... not just "nice to have" or "saves me a lot of time", but "impossible to do without them"!


3 points by akkartik 1067 days ago | link

Interesting! Could you elaborate on your notion of 'container'? Could a function in one container/runtime call one in another?


4 points by aw 1067 days ago | link

A container is an object which stores top level variables. When you use a top level variable "foo", that's a reference to "foo" in the container that you're eval'ing your code inside.

A container can be a simple Arc table, or some object like a Racket namespace which can be made to act like an Arc table. So suppose I have a container `c`, and I eval some code in c:

    (eval '(def foo () 123) c)
Now `c!foo` is the function foo:

    > (c!foo)
A function can always call a function in another container if it has the same runtime. A compiled function ends up having a reference to the container it was compiled in because the top level variables used in the function are compiled to have a reference to the container, but other than that it's just a function.

A function in one container can call a function in another container with a different runtime if the runtimes are compatible. Which they might not be. For example, a function compiled in the srcloc runtime can call a function in the mpair runtime because both runtimes compile Arc functions to Racket functions, but the mpair runtime wouldn't understand an Arc list passed to it from the srcloc runtime made up of Racket syntax objects. So you might need to do some translation between runtimes depending on how different they are.


1 point by akkartik 1067 days ago | link

Thanks, that makes sense.

I wasn't aware srcloc was a runtime. Are you referring to or something else?


2 points by aw 1067 days ago | link

A runtime is how we choose to implement a language once a program has been compiled and is now running.

For example, in Arc 3.2, `cons` creates an immutable Racket pair, `car` works with a Racket pair or a Racket symbol nil, and `scar` modifies an Arc cons with Racket's `unsafe-set-mcar!`.

These are all runtime decisions. We could create a different runtime. For example, Arc's `cons` could create a Racket mpair, and `scar` could use Racket's `set-mcar!`.

For Amacx I've written two runtimes so far. One I called "mpair" because it implements Arc lists using Racket's mpairs. The other I called "srcloc" because it allows source location information to be attached to Arc forms (lists and atoms).

Currently, in srcloc, source location information is attached to forms using Racket's syntax objects. Thus, in srcloc, Arc's `car` can be applied to a Racket syntax object which wraps a list or nil, and it will unwrap the syntax object and return the underlying value.

It's a coincidence that I chose to call my runtime "srcloc" and Racket happens to also store source location information in a struct they call "srcloc" :-)