1)We can make a simple inliner, and name the primitives like #car#, #cdr#, etc. Then define car as (set car (fn (x) (#car# x)). In the last, we use the inliner to do the job. The inliner approach is better than an extra pass of eliminating primitive calls, because it can do more optimization.
2)Maybe writing a metacircular interpreter in compiled arc is the best way of implementing both macros and eval-when.
3)I don't know if the current unicode libs are good enough.
4)Implementing green threads via continuations should be a good start.
5)For standard I/O, use stdio. Anything else could be done by an ffi. Since arc2c is a static compiler, ffi could be portable even what we have is an ANSI-C system, because we have to deal with neither the .dll/.so stuff nor the libffi lib.
1) hmm. Interesting. Can't think of how to do inlining yet though.
As an aside, my intent was that library functions in a specially defined library file can access primitives %car etc., but not other code - user code can use %car etc. for their own purpose without clashing with the primitives, if only for compatibility with Arc.
2) Yes, this seems correct. And there's also 'eval. Yes, eval's not often used, but still...
5) The problem is using green threads with blocking I/O. Obviously in a server if one thread is blocked by I/O, other threads should still continue. It's really the threads/IO interaction that's bothering me.
Edit: which reminds me - currently closure structures are untyped, meaning we can't safely get the type of a function.
Some background first: the compiler first puts all top-level expressions as parts of a do-block. For much of the compilation run (until it reaches CPS transformation) the compiler represents the entire program in this do-block.
I intend that the library's will simply be inserted at the front of the do-block's list of subexpressions.
The inline transformation phase then iterates over the top-level elements of the topmost do-block. If a top-level element is an assignment to a global variable, we attempt to determine if the assignment is eligible for inlining.
To determine if the assignment is eligible for inlining, we check if it's assigning a function. Since this is a top-level block, the function cannot close over any variables. Then we detect if the function's parameters are referenced 0 or 1 times (if referenced more than that, we can't safely inline it without putting it in a let-block - which creates a function anyway, so no point inlining). Note that we can actually allow the function to reference itself via the global, since we won't remove the assignment to the global.
If we determine that a global is eligible for inlining, we add the symbol and its function to a table of inlinable functions.
Now here's the hard part: we also have to ensure that the global can be safely inlined. If a global is assigned to exactly once, then it could.
While scanning, we check if the global is already in the inlineable set. If it is, we add the global in the banned set. This means that redefining a global will prevent it from being inlined:
(fn () t))
(prn:global) ; t
(fn () nil))
(prn:global) ; nil
; cannot safely inline
If a top-level expression isn't an assignment to a global, we scan through its subexpressions for an assignment to a global. For each global assignment, we add the global in the banned set. This prevents us from inlining non-trivially inlineable stuff:
(let c nil
(fn () c))
(fn (v) (set c v))))
After this scan through, we have a set of inlinable functions and a set of banned-from-inlining. We remove from the inlineable set those that are in the banned set. Then we perform inlining.
Inlining is then done this way: We scan the entire syntax tree and search for function applications, where the function position is a reference to a global variable in our final inlineable set. If it is, we then replace the application with a copy of the function's contents (the function's contents are always placed in a do-block, incidentally). We scan through the copy and look for references to the function's parameters, replacing the parameters with the appropriate expression in the function application. For vararg inlining, we may use the %cons primitives directly to build the vararg parameter.
The assignment to the global is retained. However, we can then repeat the unused-global-removal step (or move that step after this step) to remove the actual non-inlined version if it's not being passed as a function.
1. Local functions which have enclosing environments are harder to inline. If a function's environment is different to the caller's environment, we should replace all its free variables to references to its environment. For simplicity, you can inline only the combinators(functions which have no free variables).
2. When inlining, we should rewrite the parameters only if they are free to the function body, not bound by other local functions in the body.
Hmm. Turns out this is a real issue, but for a different reason: since local variables are given unique ID's, we should actually replace local variable ID's for subfunctions when a function is inlined several times: