Arc Forum | Advice:1)We can make a simple inliner, and name the primitives like #car#, #cdr#...

Arc Forum

5 points by binx 6296 days ago | link | parent

Advice:

1)We can make a simple inliner, and name the primitives like #car#, #cdr#, etc. Then define car as (set car (fn (x) (#car# x)). In the last, we use the inliner to do the job. The inliner approach is better than an extra pass of eliminating primitive calls, because it can do more optimization.

2)Maybe writing a metacircular interpreter in compiled arc is the best way of implementing both macros and eval-when.

3)I don't know if the current unicode libs are good enough.

4)Implementing green threads via continuations should be a good start.

5)For standard I/O, use stdio. Anything else could be done by an ffi. Since arc2c is a static compiler, ffi could be portable even what we have is an ANSI-C system, because we have to deal with neither the .dll/.so stuff nor the libffi lib.

1 point by almkglor 6296 days ago | link

1) hmm. Interesting. Can't think of how to do inlining yet though.

As an aside, my intent was that library functions in a specially defined library file can access primitives %car etc., but not other code - user code can use %car etc. for their own purpose without clashing with the primitives, if only for compatibility with Arc.

2) Yes, this seems correct. And there's also 'eval. Yes, eval's not often used, but still...

3) erk

4) that's what I planned: http://arclanguage.com/item?id=5794 . However stefano suggests using pthreads.

5) The problem is using green threads with blocking I/O. Obviously in a server if one thread is blocked by I/O, other threads should still continue. It's really the threads/IO interaction that's bothering me.

Edit: which reminds me - currently closure structures are untyped, meaning we can't safely get the type of a function.

-----

4 points by almkglor 6296 days ago | link

Okay, here's a first pass at inlining.

Some background first: the compiler first puts all top-level expressions as parts of a do-block. For much of the compilation run (until it reaches CPS transformation) the compiler represents the entire program in this do-block.

I intend that the library's will simply be inserted at the front of the do-block's list of subexpressions.

The inline transformation phase then iterates over the top-level elements of the topmost do-block. If a top-level element is an assignment to a global variable, we attempt to determine if the assignment is eligible for inlining.

To determine if the assignment is eligible for inlining, we check if it's assigning a function. Since this is a top-level block, the function cannot close over any variables. Then we detect if the function's parameters are referenced 0 or 1 times (if referenced more than that, we can't safely inline it without putting it in a let-block - which creates a function anyway, so no point inlining). Note that we can actually allow the function to reference itself via the global, since we won't remove the assignment to the global.

If we determine that a global is eligible for inlining, we add the symbol and its function to a table of inlinable functions.

Now here's the hard part: we also have to ensure that the global can be safely inlined. If a global is assigned to exactly once, then it could.

While scanning, we check if the global is already in the inlineable set. If it is, we add the global in the banned set. This means that redefining a global will prevent it from being inlined:

  (set global
    (fn () t))
  (prn:global) ; t
  (set global
    (fn () nil))
  (prn:global) ; nil
  ; cannot safely inline

If a top-level expression isn't an assignment to a global, we scan through its subexpressions for an assignment to a global. For each global assignment, we add the global in the banned set. This prevents us from inlining non-trivially inlineable stuff:

  (let c nil
    (set reader
      (fn () c))
    (set writer
      (fn (v) (set c v))))

After this scan through, we have a set of inlinable functions and a set of banned-from-inlining. We remove from the inlineable set those that are in the banned set. Then we perform inlining.

Inlining is then done this way: We scan the entire syntax tree and search for function applications, where the function position is a reference to a global variable in our final inlineable set. If it is, we then replace the application with a copy of the function's contents (the function's contents are always placed in a do-block, incidentally). We scan through the copy and look for references to the function's parameters, replacing the parameters with the appropriate expression in the function application. For vararg inlining, we may use the %cons primitives directly to build the vararg parameter.

The assignment to the global is retained. However, we can then repeat the unused-global-removal step (or move that step after this step) to remove the actual non-inlined version if it's not being passed as a function.

-----

1 point by binx 6296 days ago | link

Things that have to be remembered:

1. Local functions which have enclosing environments are harder to inline. If a function's environment is different to the caller's environment, we should replace all its free variables to references to its environment. For simplicity, you can inline only the combinators(functions which have no free variables).

2. When inlining, we should rewrite the parameters only if they are free to the function body, not bound by other local functions in the body.

-----

1 point by almkglor 6296 days ago | link

1. I'm not proposing yet to inline local functions, especially those that close on environments. However, what algorithm would you propose for inlining local functions?

As an aside, closure-conversion makes the creation of environments explicit. Perhaps an inlining step can be added after closure-conversion?

2. I don't understand this part.

-----

3 points by binx 6295 days ago | link

2. Take this function as an example:

(fn (x y) (g x y (fn (x) (h x))))

When inlined with x=1 and y=2, it should be rewritten as:

(g 1 2 (fn (x) (h x))), not

(g 1 2 (fn (x) (h 1)))

Because the second x is not free in the function body.

-----

2 points by almkglor 6295 days ago | link

I see. This is actually handled implicitly in the compiler's AST structure: during the conversion from the list form to the AST form, each local variable is given a unique ID:

  (fn (x y) (g x y (fn (x) (h x))))
  =>
  (fn (x@1 y@2) (g x@1 y@2 (fn (x@3) (h x@3))))
  ; approximation of the AST structure, the AST
  ; is really a table of properties

So mindless replacement of the inlined version will simply replace x@1, not x@3.

  (g 1 2 (fn (x@3) (h x@3)))

-----

1 point by almkglor 6295 days ago | link

Hmm. Turns out this is a real issue, but for a different reason: since local variables are given unique ID's, we should actually replace local variable ID's for subfunctions when a function is inlined several times:

  (set glob
    (fn (x@1 y@2)
      (g x@1 y@2 (fn (x@3) (h x@3))))
  (glob 1 2)
  (glob 3 4)
  =>
  (set glob
    (fn (x@1 y@2)
      (g x@1 y@2 (fn (x@3) (h x@3))))
  (g 1 2
    (fn (x@4) (h x@4)))
  (g 3 4
    (fn (x@5) (h x@5)))

-----