Now, I've got first class functions : they have a type tag and can thus be printed (it only displays #<procedure>, as canonical arc), asked their type, etc. As soon as I did that, Boehm GC started complaining a lot (always saying "hmm, is that a pointer or not ? I dunno, I'm lost...").
Not very dramatic, these are just warning messages. But as Boehm seems almost borken on some machines and as it was starting to bother me, I did it again : I implemented a new version of my own, hand-made, garbage collector. It was easier with real first-class closures and it is much faster this time. On the test code I've written, the program runs twice as fast with my buggy-GC.
The code still partially relies on Boehm GC, as I have to make it know how every data type has to be collected, but it currently collects tagged objects, cons cells and closures. Will be pushed on Monday.
We'll all be waiting ^^. How'd you implement closures? As a structure or just an array? Boehm might get confused in the array case if you're using the first entry of the array as a type tag (which isn't a pointer). Or maybe not; I haven't studied Boehm GC very well.
As for GC: what kind did you write? Copy or mark? If it's marking, I'd suggest a mark-and-don't-sweep collector. I think most incremental and thread-friendly modern GC's are copying though.
Edit: as for me doing the macro hacking stuff, well, it looks like I'm all hacked out. Hehehe^^
Hmm... I'm not very good at terminology, but I'm almost sure it's a mark-and-sweep. The implementation relies on system malloc. Every time some memory is required, the user calls gc_malloc. This function calls malloc stores the pointer in an array and returns that pointer. Once the array is full (we're not talking about consumed memory yet, but about built references, so it can break down if you build very big objects), collection is performed : everything not reachable from the stack (or recursively from reachable objects) is freed. It has to be improved, but for now on it's working.
I implemented closures as an array of long. Very easy to deal with. The first one is the tag type, the second one is the goto label, the third is size of the array (we need it for garbage collection) and all others are the arguments (well, they are objs, but they are implemented as a long).
It does indeed seem to be a mark-and-sweep. Generally though most GC's will handle the heap: they allocate one big bunch of memory via malloc() and allocate from that.
"Mark" means to determine if a memory area is accessible. Usually this means setting some sort of bit or variable for each memory area. After you've marked all reachable memory, you perform a "sweep": any unmarked memory is freed.
A slightly-more-efficient algorithm than mark-and-sweep is mark-and-don't-sweep (obviously because you skip the "sweep" step), but this requires us to handle the heap directly. Here's an explanation:
Each memory area in the heap has a "free/in-use" bit. This bit's sense of "free" can vary. For example, at any one time, all "free/in-use" bits may have the meaning:
0 = FREE
1 = IN-USE
At another time, however, the meaning might be:
0 = IN-USE
1 = FREE
The magic here is the way the free/in-use bit is interpreted by the memory manager.
Now suppose we allocate a bit of memory that is too large for the current free memory pointed at the alloc pointer:
|-------| <- I need something this big
| 1 | 0 | 1 | 1 | 0 | 1 | 1 |
Obviously, we have to skip the free memory that's too small. However, let me introduce an invariant: everything to the left of the alloc pointer must be in-use. So if ever we skip free memory that's too small, we still mark it in-use, but we don't return it (obviously, it's too small!). Instead we continue over to the next free memory and see if that is large enough, and so on.
In this case the very next portion of memory is available:
|-------| <- I need something this big
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 |
| ^alloc pointer
...and afterwards... uhh... well.... we just allocate as normal, except the meaning of 0/1 of the free/in-use bit has flipped. "Don't sweep". Thus our sweep step is part of our allocation.
As an aside: I've started writing an 'eval function for use with macros in arc2c. This is done by creating a new "eval" function using (make-eval) in make-eval.arc. It's not done yet though.
My plan is that for each compilation run, we (make-eval) a new version of 'eval. Why? Because we want to protect the global environment.
For example, the user code might want to use the following macro:
(mac xe body
`(tag (div class 'xe)
Unfortunately, 'xe is a function defined and used by arc2c. If we were to simply 'eval all 'mac forms, then user code could thrash arc2c.
Instead, we create a "protected" eval. This eval, when used, will prevent writes to global variables. Instead, writes to global variables will mutate a global-variable table, not the "real" global variables.
However, it's not done yet, there are a bunch of TODO's floating around. And unfortunately, I might not be able to do this for a week. Or maybe two weeks, or maybe a month.
A friend of mine has a pretty big personal Real Life(TM) problem (it involves, like nearly every big personal RL problem, a member of the opposite sex). I'll need to help him for now. Sorry.
(the guy will, usa embassy willing, be in san francisco, california, usa a month from now. he's had to borrow quite a bit from his friends too, so we're all pretty tapped out and can't accompany him. err. just wondering if someone near there could keep an eye on him.)
The code for the 'eval interpreter is on github. Anyone who wants to try continuing it is welcome to do so. You're even welcome to completely dismantle that bit and start some other way of doing macros.
That looks clever, and not too complicated... I'll try to implement it when I'll have enough time... As a matter of fact, dealing with heap space myself would let me reduce the size of closure objects (I wouldn't need to know the # of arguments they hold anymore).
Well, good luck with your friend, and see you soon !