Arc Forum | arc2c update

Arc Forum

arc2c update

9 points by almkglor 6298 days ago | 77 comments

I've made several updates to arc2c this weekend, especially since today we have a holiday in our country, so I had a 3-day weekend.

My family's a bit pissed at me I think though ^^.

Here's the new stuff:

1) Restructured some bits of the 'compile-file function. Adding new transformations is now easy,.

2) Arc style 'if is now fully supported:

  (if
    (foo)
      1
    (bar)
      2
       'else)

3) ssyntax is now supported: (prn:foo 1), prn!hehe

4) 'load and 'require, if on the top level of the file, now insert the contents of the specified files.

5) variadic functions now work:

  (set in
    (fn (exp . rest)
      (let self nil
        (set self
          (fn (exp rest)
            (if (no rest)
                 nil
                (is exp (car rest))
                  t
                (self exp (cdr rest)))))
        (self exp rest))))

  (set list
    (fn rest
      rest))

6) 'set now works on local variables, see example of 5

7) Unused globals which are assigned to, but not read, are removed; also, for 'do blocks, if an expression is not in the tail position and doesn't have potential side effects - basically, literals, quote forms, variable references, and 'fn that aren't called or stored in variables - are removed:

  (do
     1 2 3 4 5)
  =>
  (do 5)

This "optimization" is necessary so that we can simply mindlessly insert library functions (arc.arc contents, as well as some xdef'ed functions in ac.scm) to the source code, and this step will remove the unused library functions, removing code bloat which will probably just make debugging difficult.

This step will of course need to be removed/skipped if 'eval is at all used, but most programs won't use eval anyway.

TODO (by my expected priority):

1) Decoupling of primitives from globals, i.e. allow say, 'car and 'cdr to be redefined, instead of automatically making them primitives whenever they are encountered in function position.

2) Macros. Especially 'def. Also 'eval-when. Hmm.

3) Strings. Ooh, look ma, big unicode headaches!!

4) Threads

5) I/O

5 points by binx 6298 days ago | link

Advice:

1)We can make a simple inliner, and name the primitives like #car#, #cdr#, etc. Then define car as (set car (fn (x) (#car# x)). In the last, we use the inliner to do the job. The inliner approach is better than an extra pass of eliminating primitive calls, because it can do more optimization.

2)Maybe writing a metacircular interpreter in compiled arc is the best way of implementing both macros and eval-when.

3)I don't know if the current unicode libs are good enough.

4)Implementing green threads via continuations should be a good start.

5)For standard I/O, use stdio. Anything else could be done by an ffi. Since arc2c is a static compiler, ffi could be portable even what we have is an ANSI-C system, because we have to deal with neither the .dll/.so stuff nor the libffi lib.

-----

1 point by almkglor 6297 days ago | link

1) hmm. Interesting. Can't think of how to do inlining yet though.

As an aside, my intent was that library functions in a specially defined library file can access primitives %car etc., but not other code - user code can use %car etc. for their own purpose without clashing with the primitives, if only for compatibility with Arc.

2) Yes, this seems correct. And there's also 'eval. Yes, eval's not often used, but still...

3) erk

4) that's what I planned: http://arclanguage.com/item?id=5794 . However stefano suggests using pthreads.

5) The problem is using green threads with blocking I/O. Obviously in a server if one thread is blocked by I/O, other threads should still continue. It's really the threads/IO interaction that's bothering me.

Edit: which reminds me - currently closure structures are untyped, meaning we can't safely get the type of a function.

-----

4 points by almkglor 6297 days ago | link

Okay, here's a first pass at inlining.

Some background first: the compiler first puts all top-level expressions as parts of a do-block. For much of the compilation run (until it reaches CPS transformation) the compiler represents the entire program in this do-block.

I intend that the library's will simply be inserted at the front of the do-block's list of subexpressions.

The inline transformation phase then iterates over the top-level elements of the topmost do-block. If a top-level element is an assignment to a global variable, we attempt to determine if the assignment is eligible for inlining.

To determine if the assignment is eligible for inlining, we check if it's assigning a function. Since this is a top-level block, the function cannot close over any variables. Then we detect if the function's parameters are referenced 0 or 1 times (if referenced more than that, we can't safely inline it without putting it in a let-block - which creates a function anyway, so no point inlining). Note that we can actually allow the function to reference itself via the global, since we won't remove the assignment to the global.

If we determine that a global is eligible for inlining, we add the symbol and its function to a table of inlinable functions.

Now here's the hard part: we also have to ensure that the global can be safely inlined. If a global is assigned to exactly once, then it could.

While scanning, we check if the global is already in the inlineable set. If it is, we add the global in the banned set. This means that redefining a global will prevent it from being inlined:

  (set global
    (fn () t))
  (prn:global) ; t
  (set global
    (fn () nil))
  (prn:global) ; nil
  ; cannot safely inline

If a top-level expression isn't an assignment to a global, we scan through its subexpressions for an assignment to a global. For each global assignment, we add the global in the banned set. This prevents us from inlining non-trivially inlineable stuff:

  (let c nil
    (set reader
      (fn () c))
    (set writer
      (fn (v) (set c v))))

After this scan through, we have a set of inlinable functions and a set of banned-from-inlining. We remove from the inlineable set those that are in the banned set. Then we perform inlining.

Inlining is then done this way: We scan the entire syntax tree and search for function applications, where the function position is a reference to a global variable in our final inlineable set. If it is, we then replace the application with a copy of the function's contents (the function's contents are always placed in a do-block, incidentally). We scan through the copy and look for references to the function's parameters, replacing the parameters with the appropriate expression in the function application. For vararg inlining, we may use the %cons primitives directly to build the vararg parameter.

The assignment to the global is retained. However, we can then repeat the unused-global-removal step (or move that step after this step) to remove the actual non-inlined version if it's not being passed as a function.

-----

1 point by binx 6297 days ago | link

Things that have to be remembered:

1. Local functions which have enclosing environments are harder to inline. If a function's environment is different to the caller's environment, we should replace all its free variables to references to its environment. For simplicity, you can inline only the combinators(functions which have no free variables).

2. When inlining, we should rewrite the parameters only if they are free to the function body, not bound by other local functions in the body.

-----

1 point by almkglor 6297 days ago | link

1. I'm not proposing yet to inline local functions, especially those that close on environments. However, what algorithm would you propose for inlining local functions?

As an aside, closure-conversion makes the creation of environments explicit. Perhaps an inlining step can be added after closure-conversion?

2. I don't understand this part.

-----

3 points by binx 6297 days ago | link

2. Take this function as an example:

(fn (x y) (g x y (fn (x) (h x))))

When inlined with x=1 and y=2, it should be rewritten as:

(g 1 2 (fn (x) (h x))), not

(g 1 2 (fn (x) (h 1)))

Because the second x is not free in the function body.

-----

2 points by almkglor 6297 days ago | link

I see. This is actually handled implicitly in the compiler's AST structure: during the conversion from the list form to the AST form, each local variable is given a unique ID:

  (fn (x y) (g x y (fn (x) (h x))))
  =>
  (fn (x@1 y@2) (g x@1 y@2 (fn (x@3) (h x@3))))
  ; approximation of the AST structure, the AST
  ; is really a table of properties

So mindless replacement of the inlined version will simply replace x@1, not x@3.

  (g 1 2 (fn (x@3) (h x@3)))

-----

1 point by almkglor 6296 days ago | link

Hmm. Turns out this is a real issue, but for a different reason: since local variables are given unique ID's, we should actually replace local variable ID's for subfunctions when a function is inlined several times:

  (set glob
    (fn (x@1 y@2)
      (g x@1 y@2 (fn (x@3) (h x@3))))
  (glob 1 2)
  (glob 3 4)
  =>
  (set glob
    (fn (x@1 y@2)
      (g x@1 y@2 (fn (x@3) (h x@3))))
  (g 1 2
    (fn (x@4) (h x@4)))
  (g 3 4
    (fn (x@5) (h x@5)))

-----

3 points by almkglor 6296 days ago | link

Further updates:

1) Now has inlining of global functions

2) Many functions have been decoupled from their primitives. Functions that need access to primitives must be declared as library functions in lib-ac.scm.arc.

For most cases, functions will be inlined anyway, so the resulting code will be practically the same as in previous versions, but at least the current version could do something like:

  (set map1
    (fn (f l)
      (if l
          (cons (f (car l)) (map1 f (cdr l))))))
  (prn (map1 car (list (list 1 2 3) (list 4 5 6) (list 7 8 9))))
  ; ( 1 . (4 . (7 . nil)))

-----

3 points by sacado 6297 days ago | link

So that we don't walk on each other's toes, I propose to work myself on the following (when I'll have time and the possibility) :

1) annotate and friends, so as to make macros possible (as a macro is an annotated list).

2) chars and strings (and unicoding symbols too. For now on, it's not a problem as they can't be destructured : an utf8 string looks like a byte of chars. But, with strings, as you can access individual chars, it won't work anymore).

3) bignums, rational and inexact numbers. Maybe complex numbers, too.

4) hash tables. Maybe I'll use Lua's approach so as to make them more array-friendly.

Well, in fact, everything about typing.

-----

4 points by sacado 6296 days ago | link

Ok, I've got annotations working, with annotate, type and rep tuned to make everything work. Displaying annotated data gives the same output as canonical Arc.

pr also displays lists the same way as canonical Arc : instead of

  (foo . (bar . (baz . nil)))

it displays

  (foo bar baz)

And

  (foo . (bar . baz))

is displayed

  (foo bar . baz)

I'm fighting with strings now. I can display individual characters, as long as they are regular ASCII characters. A character is only a Unicode codepoint, encoded as a long. A string is internally an array of codepoints, so it occupies a lot of space, as each character uses 8 bytes. But accessing / modifying characters is fast & easy. When output, strings are converted to UTF-8 (that's where it does not work anymore :).

Edit : it now works, as long as you only use ASCII characters, which is, I admit it, rather stupid, but I had to do that first. And the len primitive is now implemented and works on cons cells and strings.

I didn't play with numbers & tables yet.

-----

2 points by almkglor 6296 days ago | link

Re: symbols - perhaps it's better to leave them in UTF-8, then only convert them to UTF-32 if and only if they are coerced to strings.

I suggest making tables work first before the really weird numbers ^^

Edit: there's also the problem of making I/O ports redirect to strings under construction. 'tostring is pretty much the idiomatic way of creating strings in Arc.

Edit2: As an aside, here's what I intend to do with ac.scm built-ins:

1) Remove ac.scm xdef'ed functions from the mac* stuff in xe.arc

2) Create a special lib-ac.scm.arc, where we can access primitives:

  (set car
    (fn (x)
      (%car x))) ;will access primitive %car, but only in lib-ac.scm.arc

Code that isn't in lib-ac.scm.arc will not be able to access the primitives %foo.

The above will of course allow code like:

  (map car (pair lst))

3) Finish up my inliner, so that user code of the form:

  (car foo)
  =>
  #hash( (type . app) (subx .
   (#hash( (type . ref) (var . #hash( (id . car) (uid . car)))))
      #hash( (type . ref) (var . #hash( (id . foo) (uid . foo))))))

Gets converted, by inlining, to:

  (%car foo)
  =>
  #hash( (type . prim) (prim . %car) (subx .
    #hash( (type . ref) (var . #hash( (id . foo) (uid . foo))))))

-----

3 points by sacado 6296 days ago | link

Yep. I left symbols encoded in UTF-8. When a string is output to anything (included when transformed to a symbol) it is translated to UTF-8. And I was planning to implement tables before the numeric tower. I think there's more fun in tables rather than in numbers :)

As for ports I didn't think about it yet.

-----

1 point by eds 6296 days ago | link

Is this on Anarki yet? (For that matter, I don't even see your version using Boehm GC on Anarki.)

Edit: I just found http://github.com/sacado/arc2c/tree/master. Is this the repo you are talking about?

-----

1 point by sacado 6296 days ago | link

No, it's not on the git yet as I cannot access it this week. And yes, arc2c's git is http://github.com/sacado/arc2c/tree/master

-----

3 points by almkglor 6296 days ago | link

Just a question, but I wonder if you could give eds write access to the repo? And of course, if there's anyone else out there who would like to contribute, just let us know, I think sacado would be glad to accommodate you ^^

-----

1 point by stefano 6296 days ago | link

I'd like to have access too. Currently I don't have much free time to contribute, but if I find some time i'll be glad to contribute. My github username is, surprisingly, 'stefano'.

-----

1 point by sacado 6296 days ago | link

You're on the list now :)

-----

1 point by stefano 6295 days ago | link

Thanks :)

-----

1 point by sacado 6296 days ago | link

Sure, eds, do you have a github account ? What's your login ?

-----

1 point by eds 6295 days ago | link

Yeah, my github username is slaguth.

-----

1 point by sacado 6295 days ago | link

done.

-----

1 point by almkglor 6296 days ago | link

Yes ^^

-----

4 points by sacado 6293 days ago | link

Now, I've got first class functions : they have a type tag and can thus be printed (it only displays #<procedure>, as canonical arc), asked their type, etc. As soon as I did that, Boehm GC started complaining a lot (always saying "hmm, is that a pointer or not ? I dunno, I'm lost...").

Not very dramatic, these are just warning messages. But as Boehm seems almost borken on some machines and as it was starting to bother me, I did it again : I implemented a new version of my own, hand-made, garbage collector. It was easier with real first-class closures and it is much faster this time. On the test code I've written, the program runs twice as fast with my buggy-GC.

The code still partially relies on Boehm GC, as I have to make it know how every data type has to be collected, but it currently collects tagged objects, cons cells and closures. Will be pushed on Monday.

-----

2 points by almkglor 6292 days ago | link

We'll all be waiting ^^. How'd you implement closures? As a structure or just an array? Boehm might get confused in the array case if you're using the first entry of the array as a type tag (which isn't a pointer). Or maybe not; I haven't studied Boehm GC very well.

As for GC: what kind did you write? Copy or mark? If it's marking, I'd suggest a mark-and-don't-sweep collector. I think most incremental and thread-friendly modern GC's are copying though.

Edit: as for me doing the macro hacking stuff, well, it looks like I'm all hacked out. Hehehe^^

-----

1 point by sacado 6291 days ago | link

Hmm... I'm not very good at terminology, but I'm almost sure it's a mark-and-sweep. The implementation relies on system malloc. Every time some memory is required, the user calls gc_malloc. This function calls malloc stores the pointer in an array and returns that pointer. Once the array is full (we're not talking about consumed memory yet, but about built references, so it can break down if you build very big objects), collection is performed : everything not reachable from the stack (or recursively from reachable objects) is freed. It has to be improved, but for now on it's working.

I implemented closures as an array of long. Very easy to deal with. The first one is the tag type, the second one is the goto label, the third is size of the array (we need it for garbage collection) and all others are the arguments (well, they are objs, but they are implemented as a long).

-----

2 points by almkglor 6291 days ago | link

I see.

It does indeed seem to be a mark-and-sweep. Generally though most GC's will handle the heap: they allocate one big bunch of memory via malloc() and allocate from that.

"Mark" means to determine if a memory area is accessible. Usually this means setting some sort of bit or variable for each memory area. After you've marked all reachable memory, you perform a "sweep": any unmarked memory is freed.

A slightly-more-efficient algorithm than mark-and-sweep is mark-and-don't-sweep (obviously because you skip the "sweep" step), but this requires us to handle the heap directly. Here's an explanation:

Each memory area in the heap has a "free/in-use" bit. This bit's sense of "free" can vary. For example, at any one time, all "free/in-use" bits may have the meaning:

  0 = FREE
  1 = IN-USE

At another time, however, the meaning might be:

  0 = IN-USE
  1 = FREE

The magic here is the way the free/in-use bit is interpreted by the memory manager.

Let's start with the following assumption:

  MEANING:
  0 = FREE
  1 = IN-USE
  +---------+--------------+---+------------+---------------+-------+
  |    0    |       1      | 1 |     0      |      1        |   1   |
  +---------+--------------+---+------------+---------------+-------+
   ^
   Alloc pointer

Now, suppose the application requests for memory. The allocator moves the alloc pointer and marks the memory allocated as "in-use".

  +---+-----+--------------+---+------------+---------------+-------+
  | 1 |  0  |       1      | 1 |     0      |      1        |   1   |
  +---+-----+--------------+---+------------+---------------+-------+
   |   ^
   v   alloc pointer
  returned

Now suppose we allocate a bit of memory that is too large for the current free memory pointed at the alloc pointer:

     |-------| <- I need something this big
  +---+-----+--------------+---+------------+---------------+-------+
  | 1 |  0  |       1      | 1 |     0      |      1        |   1   |
  +---+-----+--------------+---+------------+---------------+-------+
       ^
       alloc pointer

Obviously, we have to skip the free memory that's too small. However, let me introduce an invariant: everything to the left of the alloc pointer must be in-use. So if ever we skip free memory that's too small, we still mark it in-use, but we don't return it (obviously, it's too small!). Instead we continue over to the next free memory and see if that is large enough, and so on.

In this case the very next portion of memory is available:

                               |-------| <- I need something this big
  +---+-----+--------------+---+-------+----+---------------+-------+
  | 1 |  1  |       1      | 1 |   1   |  0 |      1        |   1   |
  +---+-----+--------------+---+-------+----+---------------+-------+
                                |       ^alloc pointer
                                v
                                returned

And so on, until we consume the heap:

  +---+-----+--------------+---+-------+----+---------------+-------+
  | 1 |  1  |       1      | 1 |   1   |  1 |      1        |   1   |
  +---+-----+--------------+---+-------+----+---------------+-------+

..which now requires garbage collection.

Then the magic here comes in: we flip the meaning of the free/in-use bit. This frees everyone!

  MEANING:
  0 = IN-USE
  1 = FREE
  +---+-----+--------------+---+-------+----+---------------+-------+
  | 1 |  1  |       1      | 1 |   1   |  1 |      1        |   1   |
  +---+-----+--------------+---+-------+----+---------------+-------+

Then we begin the "mark" step, specifying reachable memory areas as in-use:

  +---+-----+--------------+---+-------+----+---------------+-------+
  | 0 |  1  |       0      | 1 |   0   |  1 |      1        |   1   |
  +---+-----+--------------+---+-------+----+---------------+-------+
       ^alloc pointer

...and afterwards... uhh... well.... we just allocate as normal, except the meaning of 0/1 of the free/in-use bit has flipped. "Don't sweep". Thus our sweep step is part of our allocation.

As an aside: I've started writing an 'eval function for use with macros in arc2c. This is done by creating a new "eval" function using (make-eval) in make-eval.arc. It's not done yet though.

My plan is that for each compilation run, we (make-eval) a new version of 'eval. Why? Because we want to protect the global environment.

For example, the user code might want to use the following macro:

  (mac xe body
    `(tag (div class 'xe)
        ,@body))

Unfortunately, 'xe is a function defined and used by arc2c. If we were to simply 'eval all 'mac forms, then user code could thrash arc2c.

Instead, we create a "protected" eval. This eval, when used, will prevent writes to global variables. Instead, writes to global variables will mutate a global-variable table, not the "real" global variables.

However, it's not done yet, there are a bunch of TODO's floating around. And unfortunately, I might not be able to do this for a week. Or maybe two weeks, or maybe a month.

A friend of mine has a pretty big personal Real Life(TM) problem (it involves, like nearly every big personal RL problem, a member of the opposite sex). I'll need to help him for now. Sorry.

(the guy will, usa embassy willing, be in san francisco, california, usa a month from now. he's had to borrow quite a bit from his friends too, so we're all pretty tapped out and can't accompany him. err. just wondering if someone near there could keep an eye on him.)

The code for the 'eval interpreter is on github. Anyone who wants to try continuing it is welcome to do so. You're even welcome to completely dismantle that bit and start some other way of doing macros.

Bye for now, AmkG

-----

1 point by sacado 6290 days ago | link

That looks clever, and not too complicated... I'll try to implement it when I'll have enough time... As a matter of fact, dealing with heap space myself would let me reduce the size of closure objects (I wouldn't need to know the # of arguments they hold anymore).

Well, good luck with your friend, and see you soon !

-----

3 points by almkglor 6296 days ago | link

Okay. Although I'd like to ask if you can implement the Anarki 'defcall and 'call* too?

Basically this means that instead of updating the pc C-var at END_JUMP(), our jump: C-label does the updating. If it's an ordinary function, extract pc from CLOSURE_REF(LOCAL(0),0). If it's not, lookup its type in the call* table, and rearrange the stack:

  (obj k . ind)
  =>
  ((call* (type obj)) k obj . ind)

This also means that closures need type tags now too.

-----

3 points by sacado 6296 days ago | link

Ok, I'll work on this too. I need to type closures now anyway, as I am implementing full support for things like pr and type.

Btw, about type tags and the pointer-as-fixnum hack : I read a paper about the implementation of Lua (for tips about tables implementation). In Lua, everything, including numbers, is implemented as a structure containing first the type, then the data for the actual object (somebody already mentioned this in the forum). The reason is that the ANSI C standard does not allow the pointer-as-fixnum trick : you cannot know for sure that addresses will have a 0 low bit. In practice, it works on the most common architectures, but it's not fully portable (which is a problem given Lua's target). Hmm, you were right, maybe later we could add another version of codegen that would be slower and more memory-consuming but completely portable.

-----

3 points by almkglor 6296 days ago | link

True, which is why I was always a bit leery of the trick. Besides, if you're going to implement bignums anyway, you might as well start off "fixnums" as very small bignums ^^. LOL. Of course there's the obvious problem that most applications won't use bignums, and in applications that do, most numbers still aren't bignums.

But then, if you add two fixnums together and the result won't fit in a fixnum...

As an aside, in the current mzscheme implementation, it seems fixnums are type 'int and everything else is type 'num

Finally: if you need to access call* , it may be possible to determine its position in GLOBAL() and add a C-global CALL_STAR:

  ; in codegen...
  (list
     "obj * CALL_STAR;"
     ...
     "int main(){
     CALL_STAR = &GLOBAL(" (pos [is _!uid 'call*] global-vars) ");
     initialize_constants();
     execute(0);
     }")

Quite obviously the unused global elimination step will have to avoid eliminating 'call* though.

Edit: regarding typing closures: I've been reading about flexible array members of structs, so it might be possible to define the closure type as:

  struct{
    long type; /* T_FN */
    obj vars[];
  } closure_type;

For non-C99 compliant compilers (practically everything except GCC), though, you need:

  struct{
    long type; /* T_FN */
    obj vars[1];
  } closure_type;

-----

4 points by binx 6295 days ago | link

It seems that all of us have put too much emphasis on premature optimizations and fancy features... The first priority may be to simulate the whole arc's core semantics and make the compiler able to compile itself.

-----

2 points by almkglor 6295 days ago | link

True. I'm trying to hack the macro stuff, to not much effect. Erk.

In fact quite a bit of arc.arc is now compileable, although you do have to transform (def ...) to (set .... (fn ...)) manually. So really what's needed now is macros. Also trying to think of how best to implement optional args and destructuring args - probably by just hacking off rest arguments (for optional args) and let's (for destructures)

-----

2 points by stefano 6294 days ago | link

Macros should be easier to implement once the compiler is able to compile itself, because this way the compiler and the compiled macro have the same internal representation of data structures, so passing arguments between the two shouldn't be too hard.

-----

3 points by almkglor 6294 days ago | link

> once the compiler is able to compile itself

There are several uses of macros in the compiler, unfortunately. In particular the 'def macro is too much of a convenience. So in order for the compiler to easily compile itself, it first has to implement macros. Chicken, meet egg.

Ah heck, maybe I should just use 'eval now and implement a compiled 'eval interpreter later that can interpret code and yet allow interpreted code to call compiled code and vice versa.

In fact I already have a bit of a sketch for this (which is necessary if we want to allow compiled programs to use 'eval). Basically put interpreted '(fn ...) forms into a 'interpreted-fn annotated type together with surrounding environment, add an entry to the 'calls* table (via defcall, say) for 'interpreted-fn to, say, a $$interpreted-fn-apply function which binds the parameters into an environment table and calls the 'eval interpreter.

Of course this requires some changes in the base system: we need at the very least a %symeval primitive which when given a symbol will give its global binding, a %symset primitive which will modify a symbol's global binding, and obviously we need a link from the symbol to the GLOBAL() array (and dynamically create new containers for created symbols - if it's not in the GLOBAL() array then the compiled code would never read that global anyway, only the interpreted code ever will).

The rest of the interpreter is just a standard scheme interpreter, the only real support we need is to be able to call compiled-from-interpreted and interpreted-from-compiled, and the reading and binding of global symbols, including those that aren't in the GLOBAL table.

Ouch, my head hurts. And sacado's the one doing the Unicode strings. LOL

-----

5 points by kens 6294 days ago | link

Would it be worth implementing 'def directly? This would give a lot more functionality right away. This could be temporary until macros are implemented.

-----

1 point by almkglor 6293 days ago | link

Possibly. There's a bunch of "macro" transformations in xe.arc, possibly I'm just a bit too lazy to think. However I don't like depending on those transforms, I want to do it "properly"

-----

1 point by sacado 6293 days ago | link

I think that's what I'm going to do, until macros are implemented : make 'def a special form, automatically transformed into (set foo (fn...

-----

1 point by stefano 6293 days ago | link

For the global vars problem, a solution could be to associate top level values directly with the symbol, this way a symbol would consist of three values: its string representation, its global value (initially a special 'unbound value') and a property list.

-----

1 point by almkglor 6292 days ago | link

The current style has an optimization where all globals are simply referenced directly from an array in O(1). I'd rather that symbols point to entries in this array, because symbol-as-global-variable lookups are expected to be completely nonexistent if 'eval isn't involved in the program anyway (who uses 'eval in a language with 'read?). Only newly created symbols must have allocated variable values, and only for the benefit of 'eval'ed code - we can already know the global variables in the compiled code, because the compiler need that info anyway.

Basically:

  struct {
    long type; /*T_SYM*/
    char* stringform;
  #ifdef EVAL_USED
    obj* binding;
  #endif
  } symbol;

   int main(){
     /*compiler generated only if eval is used*/
     obj sym; symbol* sympt;
     sym = SYM2OBJ("globalvar0");
     sympt = (symbol*) sym;
     sympt->binding = &GLOBAL(0);
     sym = SYM2OBJ("globalvar1");
     sympt = (symbol*) sym;
     sympt->binding = &GLOBAL(1);
     ...
   }

This way the current performance is retained (global variable lookups are O(1)).

-----

2 points by stefano 6292 days ago | link

I don't know how much this solution will be once support for a dynamic load (e.g. from the REPL) will have to be implemented, because you'll have to keep an index of the last global variables created across different compilation sessions. With threads it gets even more complicated (mutex on the index?). With symbols it would be simpler to implement a dynamic load or definition of a global var from the REPL. The price paid is a slightly slower access to global variables, because 2 references to memory are necessary for every refrence to a global var. Global variables lookups are still O(1) though, e.g: sym->binding for read access and sym->binding = value for write access.

-----

2 points by almkglor 6292 days ago | link

> the last global variables created across different compilation sessions

I don't understand this part. I was proposing that 'eval would be an interpreter, not a compiler. My intentions was that compiled code would be statically generated (the way it's done now), so 'eval cannot possibly compile code. It would be a compiled interpreter of Arc. arc2c is a static compiler, so 'eval won't add ever add compiled code; the best it can do is create a 'interpreted-fn object that contains an interpreted function's code (as a list) and the enclosing interpreted environment

So a dynamic load would just interpret the expressions in the file being loaded:

  (set load
    (fn (f)
      (w/infile s f
        (whilet e (read s)
          (eval e)))))

'eval would be able to access the global variable table indirectly via the symbols and %symeval/%symset.

Basically, 'eval would be compiled to something like this:

  (set eval
    (fn (e (o env nil))
      (if (isa e 'symbol)
          (if env (lookup-environment env e)
                  (%symeval e))
          (...))))

Also: if the compiled code doesn't reference it, it won't be in the GLOBAL() array. The reason is simple: the compiled code won't reference it, ever. If 'globalvar isn't in GLOBAL(), then it does not exist in the compiled code. So it doesn't matter that it's not in the GLOBAL() array - the compiled code never referenced that global, so it won't ever use an index into the GLOBAL() array to refer to it. The interpreted code might, but that's why we have an indirect reference connected to the symeval.

Also, when I say O(1), I mean O(1) with the number one, as in only one layer of indirection (an index within a table). If global bindings are kept with the symbol only, then all global accesses - even precompiled ones - need (1) to find the symbol and (2) get the binding, for a total of O(2).

In other words: 'compile-file compiles, but it creates a new executable which is never connected to the process that ran 'compile-file. 'eval just interprets, and if the interpreted code mutates a global of the program, then that global gets mutated for real, in the program (what are you doing using 'eval on untrusted coe anyway). But if the interpreted code mutates a global that is never used in the program, it just creates a new global variable, one which is never referenced by the program (by definition, because the program never used it).

-----

1 point by stefano 6292 days ago | link

I thought eval compiled code, loaded it and then executed it. I've been mistaken. With the compiled code completely static then your strategy is better than assigning values to symbols.

-----

1 point by binx 6296 days ago | link

What closure representation does arc2c have now? Flat or nested? The former is quicker for variable lookup, but slower at set and closure creation. The latter is just on the opposite side, and it eats more memory becaust it allocates many useless frames.

For the flat closure representation, we should notice the following:

Every potentially seted local variable should be indirected by a reference. Of course you can reduce the number of them by analysising which are only seted once or not shared by other closures.

-----

3 points by almkglor 6296 days ago | link

Flat. Similar to stefano's suggestion: http://arclanguage.com/item?id=5775

However the actual implementation is http://arclanguage.com/item?id=5792

Basically instead of a cons cell (as suggested by stefano) I use a new structure, the "sharedvar", which is just a container for an obj. These also means that the actual closure objects are immutable after creation.

Only local variables that are ever set are put in sharedvar's, other local variables are kept in the flat closure. However I don't analyse for variables that are set only once, or which aren't shared by other closures yet.

-----

3 points by binx 6296 days ago | link

Yeah, this is the approach taken by chez, mlton, and many recent compilers of functional languages.

And all the optimization stuff(unboxing of "sharedvar", inlining, type inference, known function analysis, unused variable elimination, etc) can be made agressive by only global flow analysis, which is rather time-consuming. But I'm curious to know how far a relatively conservative compiler which doesn't do any flow analysis is able to go. If stalin performs a little worse, but compiles 10x times faster, I believe that much more people would use it.

-----

3 points by almkglor 6296 days ago | link

Sharedvar unboxing is a little difficult, since there are two types of local variables: those in closures, and those in parameters:

  (let kept ()
    (set keeper
      (fn (x)
         (set kept (cons x kept)))))
  ; vs.
  (set fooer
    (fn (x)
      (set x (rev x))
      (do-something x)))

So basically we need two types of local-variable-set primitives: one for closures-variable-set (first case above) and another for parameter-variable-set (second case)

Re: Stalin - is it that slow? Meaning an order of magnitude improvement of time is needed to make it comfortable?

Edit:

Type inference: well I can't think of a good way of getting type inference generically, but certainly it's possible for e.g. '+. '+ requires that all parameters are either numbers, or all strings, or all lists, and if we can determine that one parameter is of a specific type, we can put the checking that the other parameters are of that type and immediately bind the + to the specific type.

For example if we have %n+ for numeric addition, %s-join for string concatenation, and %l-join for list concats:

  (+ x y z)
  =>
  (+ x y z) ; can't determine type

  (+ 1 x)
  =>
  (%n+ 1 (let check x
           (if (is (type check) 'num)
               check
               (err "+: type mismatch"))))

  (+ (list 1 2 3) x)
  =>
  (%l-join (list 1 2 3)
    (let check x
      (if (is (type check) 'cons)
          check
          (err "+: type mismatch"))))

etc.

-----

3 points by binx 6296 days ago | link

Stalin might be the most optimizing but slowest functional language compiler ever written.

Sharedvar unboxing is not an important issue because it doesn't make much difference in efficiency. Most scheme programs don't update local variables very often. The most useful optimization parts are(in my opinion): Special treatment of let and letrec, inlining and known function detection.

General ML-style type inference for scheme is impossible. What we can do is to infer as more types as possible.

-----

2 points by almkglor 6296 days ago | link

> Special treatment of let and letrec

How special?

> known function detection

Err, as in...? Can you give an example?

-----

4 points by binx 6296 days ago | link

1)Trasforming let and letrec to ((fn (...) ...) ...) is not efficient. First, it would allocate a closure. Second, it would perform a function call. Instead, the variable bound by let and letrec should be allocated on the stack, and no function calls are needed.

2)For example, in:

(f x)

If f is statically known, and f's environment is null or is the same as the environment of the calling site, then the function call should be a direct jump. It eliminates the cost of (1)global fetching of 'f, (2)extracting the information of the address and environment, (3)switching the environment, (4)an indirect jump.

By known function detection, tail recursive functions can be compiled to exactly the same code as loops in imperative languages do.

-----

3 points by almkglor 6296 days ago | link

1) Given that everything is transformed to CPS, pretty much everything - including sequences I think - ends up being a function call. In fact, only jumps exist at all.

Not sure about how the closure-conversion works. This may be workable, would you be willing to work on this?

2) I get this now, although I'm not sure how to translate this into the current Arc2c output. I'll discuss the current arc2c calling convention and you tell me how workable your proposal is.

------

Currently, arc2c output works out like this:

1) Each function is simply a case in a large switch statement:

  jump: switch(pc){
   case 0:
   ...code for function 0...
   case 1:
   ...code for function 1...
  }

2) There exists a "stack" which is not the C-stack:

  obj stack[MAX_STACK];
  obj *sp;
  #define PUSH(x) (*sp++ = (x))
  #define POP() (*--sp)

3) At the start of each function (with the exception of function 0, which is the top-level), the stack contains a [0] closure for the current function, [1] the continuation, and [2+] the Arc parameters. This is assured by the calling function.

4) Functions are passed around as closure structures. The first eleemnt of the closure structure is a non-encoded number, representing the case for that function, while the rest is simply an array of closure variables.

5) Functions simply use the stack for temporary scratch space. For example this is how (+ 1 2) would compile to:

  PUSH(FIX2OBJ(1));
  PUSH(FIX2OBJ(2));
  ADD();

6) Just prior to calling a function, the calling function pushes the parameters in order: [0] closure (the function to call), [1] continuation [2+] arguments. The number of elements N for the call is computed by the compiler

7) Then at the function call, the calling function copies the top N elements of the stack into the bottommost N elements, and assures that sp = &stack[N]. Then it sets the C-variable pc to the closure's function field, and does a C goto jump;

-----

3 points by binx 6296 days ago | link

Well, I only have experience of writing direct-style compilers, not CPS-style ones, so my advice needs to be adapted.

But from mechanism of the current arc2c output you showed above, I see many places for improvement:

1)In a function:

(fn (x y z ...) (g A B C D ...)),

if B doesn't rely on x, C doesn't rely on x and y, D doesn't rely on x, y and z...etc, the calling function could avoid copying elements to the bottom. Instead, it moves the stack pointer to the bottom first, and then pushes the arguments.

2)For functions having no environments, we don't have to push a full closure, we just have to push pc.

3)For known functions, we just do a C goto jump not to the jump label, but to the (case n), because C cases are in fact labels.

Finally, in my opinion, a CPS-style compiler is no longer a better choice nowadays. It complicates the source, the debugging information and the (human) analysis of the program structure. Since we are already using a separate stack that is different to C's, continuations can be implemented in direct-style compilers as easily as in CPS-style ones. And codegen for direct-style compilers is just slightly more difficult, which isn't an issue. In addition, a naive direct-style compiler performs much better than a naive CPS-style one. The latter needs a source simplifying step to eliminate unnecessary closures and function calls produced by CPS conversion.

-----

2 points by almkglor 6296 days ago | link

1) personally I think this is a rare case, but I could be wrong

2) arc2c closures are very lightweight: it's just a simple array of obj(s), with the first obj being the pc. So in effect for functions having no environment, we are pushing a pointer to the pc.

That said, closures are also used to represent functions that can be passed around. Unfortunately closures are currently untyped, so we expect the current closure style to be changed.

Also we need to support the possibility that a "function" being called isn't really a function: after all table syntax is just (tb key). And this is perfectly valid Arc:

  (let sometable (table)
    (each k lst
      (= sometable.k (generate-something k)))
    (map sometable ; yes, we're passing a table as if it were a function!
         foolst))

3) I was actually thinking of this too, although I haven't gotten around to it.

re: CPS: I wouldn't really know. Me, I'm just hacking around at the transformations before the CPS and Closure conversions. Because of the somewhat modular construction of arc2c, in theory you could write a drop-in replacement for CPS and Closure conversions, as well as code generator, and we can then put either CPS or the direct style as options, maybe.

-----

3 points by binx 6296 days ago | link

1)It's not a rare case. It's important for speed improvement for most of useful programs. For example, map & foreach, which are used quite often, can be optimized by not copying data on stack.

-----

1 point by almkglor 6295 days ago | link

Re: let - it seems the code generator is somehow capable of detecting 'let and simply stores their variables on the stack. I could be wrong though.

-----

2 points by stefano 6296 days ago | link

A non-optimizing compiler leads easily to a "fast enough" executable. Without optimizations I think the compiled code would be 7x~10x slower than C.

Edit: I've tried the Fibonacci "benchmark" on a simple compiler i'm writing: it takes 0.2 seconds to compile the program and to compute the 32snd Fibonacci's number. On the current Arc interpreter it takes ~5 seconds.

-----

3 points by binx 6296 days ago | link

Your compiler might be much slower if it's with true scheme numbers, + operator as a function(not a primitive operator) and stack overflow checking. These features are currently supported by the arc interpreter on mzscheme.

If you can correctly eliminate function calls on +, your compiler is an optimizing one, not non-optimizing...

-----

2 points by stefano 6295 days ago | link

I've tried the same example putting a function call and a test around every arithmetic operation, and execution time went from ~0.2s to ~0.26s, not a big difference, although a few optimization will probably be necessary for something more complex than fibonacci's example.

-----

2 points by binx 6295 days ago | link

Is the function call overhead so small? I didn't realize.^^

But there are other issues: the fib example is not a very good benchmark suit, because in C, general recursion is not a common paradigm. If we compare C loops to Arc tail recursive calls generated by a simple compiler instead of comparing C recursions to Arc recursions, I believe that the difference will be much larger. Because C compiler writers have spent at least 20 years on optimizing loops...

-----

2 points by stefano 6295 days ago | link

That's absolutely true. Reaching C speed with high level languages such as Lisp it's very very difficult. CMUCL and SBCL reach roughly the speed of C, but they've been developed for a long time. As of loops speed vs. tail recursion speed, the difference shouldn't be too big.

-----

2 points by binx 6295 days ago | link

Stalin performs as better as C in numerical programs and many other benchmarks. The most exciting thing is that unlike CL, stalin doesn't need type declarations to guide optimizations. It would infer as much type information as possible. The problems is that it compiles too slow and it's not maintained anymore.

Naively implemented tail recursions is still not fast, because many common loop optimizations can't be directly applied to them unless you eliminate the function calls and regard them as true goto's. It's a rough task because the global flow analysis is needed for eliminating as many calls as we can.

-----

1 point by binx 6298 days ago | link

BTW, the only country I know which has an extra holiday in this weekend is China. Could you please tell me your email address? I know this kind of topic is not proper in this forum...

-----

1 point by almkglor 6297 days ago | link

Actually I'm in the Philippines. Our holiday is actually supposed to be tomorrow, but our president has a tendency to move holidays near weekends to give long weekends.

-----

1 point by absz 6297 days ago | link

It's not just you, we do that in the US too :)

-----

1 point by almkglor 6297 days ago | link

LOL. I suppose it's because the populace is mostly dissatisfied with the president, and the president is trying to appease the populace? Those are the conditions in our country anyway ^^

-----

3 points by absz 6297 days ago | link

Well, we've been doing that before Bush, so probably not. I think it's because makes it easier for banks, schools, businesses, etc.

-----

1 point by sacado 6297 days ago | link

Well, I'm amazed. Thanks for your involvement in that project ! As for strings & unicode, I guess there are good libraries existing.

-----

1 point by almkglor 6297 days ago | link

You're welcome. You can prolly review bits of the code via github if you don't have access to your own computer this week.

Wonder how eds is doing on arc2c? BTW have you requested to mentor his GSoC application? If you already did I'll withdraw my request.

The bit about strings is - how do we represent them? UTF-8? UTF-32? As an array or list of characters?

Arc's underlying mzscheme divides strings into code points; each code point is representable by a single 32-bit number (I think). An individual "character" in mzscheme is thus a code point (from what I gather), although in Unicode a character could be represented by several code points (or so I hear).

Now the point is that the following is quite valid:

  arc> (= p "asdf")
  "asdf"
  arc> (= (p 0) #\c)
  #\c
  arc> p
  "csdf"

So obviously access to individual characters should be easy, and replacing individual characters should probably not cause us to mess too much with memory. This almost prevents the use of UTF-8, even though all I/O will pretty much just use UTF-8.

-----

2 points by kens 6297 days ago | link

Yes, I think UTF-8 would be a disaster with modifiable strings. mzscheme uses UCS-4 (UTF-32) internally, and that would be the simplest approach. If you are willing to ignore Unicode characters > 65536, then UCS-2 would be okay with half the memory usage. When you talk about a character represented by several code points, are you talking about Unicode surrogates for characters > 65536? (Oversimplifying, two UCS-2 surrogate characters are used to represent one Unicode code point > 65536.) I think you'd be better off with UTF-32 than UTF-16 and surrogates, as surrogates look like a nightmare that you'd only want if you need backwards compatibility, see Java's character support: http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Character....

-----

1 point by almkglor 6297 days ago | link

> When you talk about a character represented by several code points, are you talking about Unicode surrogates for characters > 65536?

Actually I'm talking about so-called "combining characters" http://en.wikipedia.org/wiki/Combining_character

Normalization... hahahaha unicode unicode headaches headaches! http://en.wikipedia.org/wiki/Unicode_normalization

-----

4 points by kens2 6297 days ago | link

Oh, Unicode combining characters and normalization. I classify that as "somebody else's problem." Specifically, if you're writing a font rendering engine, it's your problem. If you're writing an Arc compiler, it's not your problem. If you want complete Unicode library support in your language (like MzScheme's normalization functions string-normalize-nfd, etc.), then you just use an existing library such as ICU, and it's not your problem. ICU: http://www-306.ibm.com/software/globalization/icu/index.jsp

-----

1 point by eds 6297 days ago | link

> Wonder how eds is doing on arc2c?

I've been following up on the forum threads but I haven't had time to actually read the code yet. (And the last time I checked, the arc2c executable gave me a segfault.)

-----

1 point by almkglor 6297 days ago | link

LOL. In any case to reduce the possibility of things being screwy I do the following on my C output:

  //#include<gc.h>
  #define GC_MALLOC malloc
  #define GC_INIT()

The current arc2c output assumes that you have a proper Boehm GC installation, but since I can't seem to get a good install here (prolly something to do with being AMD64 again) I just disable the GC for now.

Hmm, can you try on a later version?

-----

2 points by eds 6296 days ago | link

I finally got arc2c to work (without GC as you suggested). One note though: apparently arc2c relies on rm-global.arc, but doesn't load it by default. So under the current version of the compiler, 'compile-file will error until you (load "rm-global.arc").

Now I just have to get it to work with GC...

-----

2 points by almkglor 6296 days ago | link

Oops. Must have forgotten to add it to arc2c.arc then ^^. Unfortunately I won't be able to fix this until maybe 7 hours from now T.T, haha, I'm in the office ^^

The thing about GC working: well, you need to somehow download the development version of Boehm GC, and, well, that's what's stopping me for now T.T

-----

1 point by eds 6296 days ago | link

Are your latest changes on Anarki? Even after "git pull" I still seem to have the old version.

-----

2 points by Jesin 6297 days ago | link

Wow. That's a big improvement.

-----