Arc Forum | What's next?

Arc Forum

5 points by applepie 6209 days ago | 48 comments

I lurk the forum and I see people doing a lot of effort to take Arc further.

However, I have the feeling that all that work is a bit low level (dealing with concrete libraries to solve concrete problems).

That's good, but maybe it could also be interesting to think which are the higher level decisions to be taken.

Not to be rude, but I don't think the current core of Arc is better than any other Scheme. Do we still want it to be the hundred year language?

4 points by almkglor 6209 days ago | link

The neat stuff is:

  foo:bar
  foo.i
  foo!bar
  (foo i) ; has potential problems with macro bashing though

Noncanonical but still cool stuff

  given and givens
  scanners
  call* and defcall

I've also asked PG about strings-as-lists, which I think should be the way to go; he mentioned something vague about strings not wanting to be lists (I'll post it later when I have time). Somehow I have a gut feel that it's better that way, although we do need to have a separate character type.

-----

2 points by almkglor 6208 days ago | link

Strings as lists (from PG):

> They were once but it turned out I never used this fact. The things you want to strings turn out to be different.

I don't quite understand what he was saying, I replied a follow-up on it asking him to clarify but he hasn't replied to it ^^. My assumption is it means "the things you want string to be" turn out to be different.

-----

1 point by cronin1024 6206 days ago | link

If a string was a linked list, the storage and processing overhead would be considerable compared to an array-based implementation with no tangible benefit - how often do you really perform a recursive process on a string or create a new string with the tail of an old string?

-----

2 points by almkglor 6206 days ago | link

cref treeparse. http://arclanguage.com/item?id=5227

-----

3 points by almkglor 6208 days ago | link

As an aside what I'm currently working on in my severely reduced spare time (given that a friend of mine has dragged me screaming to the gym, which now takes up maybe 30% of whatever free time I have) is a concurrency-oriented VM (a la Erlang VM) for Arc.

It'll even support mutation of objects within a single process, although it will gain copy-on-write semantics when an object is shared (via a global or message passing) to other processes.

It'll either be an AST traverser or a bytecode traverser (no concrete decision yet), with the AST being approximately the intermediate AST from arc2c (except the AST is C++ structs); bytecode (if ever) will be derived from the AST (possibly even JIT).

-----

4 points by shader 6202 days ago | link

Cool. Have you made any progress? The intersection of arc/lisp and erlang is one of my interests as well. Think of macros + easy, fault tolerant concurrency, and hot code swapping! :) I'm very interested in seeing anything you've made, and maybe even helping (if I can; unlikely). Based on what you've done so far I'm sure it will be awesome. If you need any ideas on implementation, you could probably ask Joe or the Termite guys.

-----

3 points by almkglor 6202 days ago | link

Well, the memory management is just about done, with the not-very-minor problem that I haven't tested that part yet, for the very simple reason that I don't have anything to test it on. Bad style, I think; I'm pretty much going to haul a lot of stuff into workability when most of the stuff's written, instead of bit by bit. Lisplikes have spoiled me with their easy-easy-easy-test-on-the-repl T.T

The usability of the VM is also predicated on the usability of arc2c, for the very simple reason that I intend to just replace the final code generator step with a bytecode generator.

Joe == Joe Armstrong?

Edit: Digging through a bit about Termite, I see they can actually implement process migration, by the simple expedient of using call/cc. Since I fully intend to support call/cc myself, and to allow serialization of functions, my new VM will also be able to support this feature ^^.

-----

3 points by almkglor 6202 days ago | link

As an aside, I've noticed that Termite, and apparently quite a few languages built from the ground-up to have Erlang-style concurrency, all ban mutations. Which doesn't seem to be necessary for Erlang-style concurrency, actually. At least, if you're copying objects between processes, then mutation shouldn't matter. If underneath the "shared-nothing" gloss you're actually sharing (at the VM level), then yes, having immutable everything is necessary. Edit: an alternative would be to do copy-on-write for shared objects, but mixing copy-on-write for shared stuff gave me headaches, which is why I decided to drop it in favor of copy-on-message-send.

I think I'd rather keep Arc's ability to mutate everything, though; although certainly I'd avoid using mutations in any code samples if at all possible.

An interesting bit in Termite is that ports can also be passed over the network.... whoa!

-----

1 point by shader 6201 days ago | link

I suppose that choice (mutation vs. copying) depends on the relative performance / ease of coding. If you think that mutation is important, than you can sacrifice the efficiency of limited sharing. Alternatively, you can share and not mutate. Maybe you're implementation could leave that up to the user? Have directives for copy vs. share, and if sharing is permitted, than mutation is not, etc. Of course, that clutters the vocabulary and leads to extra typing.

Actually, I don't know which is worse performance wise: Code introspection to prohibit mutation, or copying. Alas, this problem gets complicated the more one moves away from simple absolutes like "no mutation" and "copy" to "immutable sometimes" and "copy mostly" :)

-----

1 point by almkglor 6200 days ago | link

> Have directives for copy vs. share, and if sharing is permitted, than mutation is not, etc. Of course, that clutters the vocabulary and leads to extra typing.

Then User A writes Process A to use sharing and User B writes Process B to use copying. Process A sends object to Process B. ^^

-----

1 point by absz 6202 days ago | link

What about implementing copy-on-write semantics? That would preserve the efficiency of sharing while allowing mutation. Since Arc only has two mutation primitives (set and lset), modifying those would be necessary and sufficient. You would have to have every object on the heap know how many threads referenced it, and if there was more than one, set/lset would copy and then mutate as opposed to simply mutating in place. (Disclaimer: I am not an expert.)

If I recall correctly, Termite's ports are essentially processes, which is what allows them to be passed around.

-----

3 points by almkglor 6202 days ago | link

scar, scdr, sref....

My problem is with this: http://arclanguage.com/item?id=7066

To summarize: Suppose Process A has an object a. It sends this object to process B. This object is a copy, a' (conceptually, in the language level). Now Process A sends object a to process B again. This object is a second copy, a'':

  (def process-A (process-B a)
    (==> process-B a)
    (==> process-B a))

  (def process-B ()
    (givens a
              (<==
                any any)
            a2
              (<==
                any any)
      (if (is a a2)
          (err "I somehow received the same object twice!"))))

So I also need copy-on-write and copy-on-message-pass-if-receiver-already-has-a-copy.

Edit: It is important for them to be separate because process B expects two separate objects, and it might mutate one without expecting the other to be mutated.

There was a solution in my old shared concept, but it involved an O(M log N) search, where M is the message size in number of objects and N is the number of shared objects the receiver has access to, which I decided was simply suboptimal compared to an O(M) copy (where M is number of bytes in the message, but still...).

Edit2: re: passing stuff around. My intent is to not integrate the actual passing-around-stuff-via-network server in the C++-side code; instead, the VM will provide access to plain sockets. Instead an Arc-side server would be defined, with its semantics standardized (but not implementation, so that you can, say, implement a "more secure" server, or use some other protocol, etc.).

Instead, if a server receives a serialized PID over its connection from another computer, it checks a table matching the PID computer and id number. If the table fails to match, it launches a "reflector process" which simply sends every message it receives to the server, for serialization, and inserts that into the table. The PID of the reflector proces is then used as the message sent by the server to the destination process. All PIDs then get this translation (obviously the table has to be bidirectional. so that a PID can be matched to the serialization of the PID.).

Obviously not only processes would require reflectors, but also ports if we want to pass them around ^^.

-----

2 points by shader 6201 days ago | link

Interesting problem. I bet this is why the erlang folks decided to use mutation-less, copy-everything, so that they didn't have to carefully think through every possible case. Maybe you could write your vm to automatically copy everything, and then have a means of coding exceptions. So, if you find that a certain specific case can share, you can add that to the list of exceptions. Otherwise, it copies, so you don't have to worry about unexpected errors.

And you should probably allow the user to require copying if they need to, so that there aren't any surprises.

As an aside, you mentioned a O(M log N) search to determine whether to share or not. Now maybe I don't understand, but you could use a Bloom filter to determine if the object is present on the other process. The only shortcoming is the false positives, which mean you would copy unnecessarily, but they happen so infrequently that it probably isn't an issue. I believe bloom filters were mentioned on Hacker News yesterday. The link on wikipedia is: http://en.wikipedia.org/wiki/Bloom_filter

Using a bloom filter, you'd be able to in constant time, low memory usage, determine if an item is in a set. In this case the items are shared objects and the set is the objects owned by the target process. Now, maybe I don't understand what you're trying to do, but you can clarify that shortly.

How much are you implementing on the c++ side? Couldn't you get a few basic necessities built first, and then just abstract towards usability in arc? I don't know exactly what you're doing, but I don't see why you have to make everything at once.

Maybe we should move this to another thread? :)

-----

1 point by almkglor 6200 days ago | link

http://arclanguage.com/item?id=7142

-----

1 point by absz 6201 days ago | link

That's a good point. But what about having copy-on-write-if-more-than-one-logical-copy semantics? Then every Arc object on the heap would have a shared flag. Then if you send an object to another thread, the object is shared, and it has its internal shared field set to true (or 1, if this is C). And whenever you try to set (or lset, or scar, or scdr, or sref---good catch) an object whose shared is true (1), then you copy it first, modify it, set its shared flag to false (0), and then make the reference through which you modified it point instead to this new object.

Except this doesn't take any shared structure into account---as soon as you send something over the network, all the objects which share it are fragmented. So you need some sort of "versioning." Perhaps you instead have every reference have a version flag, and every copy between threads increments version for the new copy; modifying something checks the reference you are modifying it through, makes a new copy if necessary for that version, and then modifies it. The problem is that I can't see a good way to do versioned indexing, especially if references are really just raw pointers.

Though note that if you do come up with a working "copy-on-write-and-other-things" scheme, the Arc copy function becomes essentially

  (def copy (arg)
    (==> (current-pid) arg)
    (<== (current-pid) any))

.

-----

2 points by almkglor 6200 days ago | link

> Then every Arc object on the heap would have a shared flag....

This was indeed my original idea: http://arclanguage.com/item?id=7037

Incidentally, 'lset is very, very hard to implement if closures are not very mutable, as in the arc2c output.

> and then make the reference through which you modified it point instead to this new object.

  (def hmm (x)
    (let y (cdr x)
      (scar y 42))) ; is x mutated? the reference I used here was *just* y...

> the Arc copy function becomes essentially (...)

Yes, this is true, although I'd prefer to add some sort of 'uniq tag so that other pending messages don't accidentally get taken ('<== does pattern matchimg, obviously implemented via my p-m macro) ^^. It will even correctly handle recursive and shared structure, which I think the current 'copy does not ^^.

  (def copy (arg)
    (w/uniq tag
      (==> (mypid) (list tag arg))
      (<==
        (,tag cpy) cpy)))

Edit:

> That's a good point. But what about having copy-on-write-if-more-than-one-logical-copy semantics? Then every Arc object on the heap would have a shared flag. Then if you send an object to another thread, the object is shared, and it has its internal shared field set to true (or 1, if this is C).

The problem here is: what if the object to be sent is already shared and the receiver already has it? Say:

  (def process-A ()
    (let v (list 42)
       (==> B v)
       (==> C v)))

  (def process-B ()
    (<==
      any  (==> C any)))

  (def process-C ()
    (givens v1 (<== any any)
            v2 (<== any any)
      (if (is v1 v2)
          (err "I somehow received the same object twice!"))))

-----

2 points by shader 6201 days ago | link

I don't think you have to worry about versioning. Because sharing is, as far as I can tell, just to save time and bandwidth over copying, conceptually all messages are copies. This means that you shouldn't be programming as if you can update other processes just by changing a shared object. They would have to detect the change and copy it again anyway, if the other owning processes were remote.

Instead, you should only transfer information over messages. That way you don't need to worry about keeping shared objects up to date for all users.

Good idea about the shared flag; maybe it should be a ref counter? That way you can tell if the other process(es) have stopped using it. So, if the ref count is 1, than just straight up mutate, otherwise, copy?

Also, couldn't the send function be:

  (-> pid message)

? It would save a char ;) Of course, you probably have a reason for not using that.

-----

2 points by absz 6201 days ago | link

You need versioning in a situation like the following:

  (givens str1 "message"        ; 1
          str2 str1             ; 2
    (do                         ; 3
      (-> another-process str1) ; 4
      (= (str1 0) #\M)          ; 5
      (prn str1)                ; 6
      (prn str2))               ; 7

Let's assume you just have the shared flag. First, line 4 will set the shared flag to true. Then, on line 5, we mutate str1. We check the shared flag of the value of str1. OK, it's true, so we make a copy of the value and mutate it. Then we print out our values on lines 6 and 7:

  Message
  message

The problem is that the copy separates the string not only from its copies in other threads, but also its aliases in the thread it comes from.

Versioning would mean that str1 and str2 would both be pointers to "version 0" of some value, which had an internal max-version of 0. Line 4 sends the object over, so max-version is incremented (to 1), and the other process is sent a pointer to "version 1" of the same object (1 because that's the new max-version). However, since no modifications have been made, somehow (I don't know how) both "version 0" and "version 1" resolve to the same underlying object. Then, on line 5, sref is called on "version 0" of this object, so the object is copied: "version 0" is copied away and modified, but "version 1" is left alone. When we print the objects, their reference to "version 0" makes them look at this new, copied objects, so we see

  Message
  Message

The problems here are that (1) I don't know how we'd do versioned addressing, and (2) what we do with the copy of "version 0." It seems like you would have a structure which had a list of the available values, and some sort of mapping from versions to values, e.g.

  ; Available values
  A --> "a string or another object"
  B --> "A string or another object"
  C --> "A string."
  
  ; Version --> value mapping
  0 --> [A]
  1 --> [A]
  2 --> [B]
  3 --> [A]
  4 --> [C]

Then, when we modify 0, to, say, "Another object...", we would add D --> "Another object..." to the list of values and change 0 to 0 --> [D].

Hmm, thinking about the underlying structure of this and writing it out makes me think this makes less and less sense, and that either (1) this is actually trivial/a restatement of another scheme/both and thus won't work, or, if I'm lucky, that (2) there's some more general and less clunky scheme hiding inside it (hey, I can dream, right?). My money's on (1) right now, but I do feel like I understand the problem better.

Oh, and I like your suggestion of -> / <-. Nice and short.

EDIT: Almkglor, is any of the code for your VM available for perusal/editing? School's gotten out and I've gotten interested, so I might be interested in working on it (if I can... I am not all that experienced, after all, so it might be beyond me; regardless, I'd be interested in reading it).

-----

2 points by shader 6201 days ago | link

Maybe you could get versioning to work by making a real alias as opposed to just a copied reference. If somehow (i'm getting really far fetched here) expansion of macros, etc, could happen in any position instead of just application, you could have the alias expand to the other variable name. Then, you wouldn't have to worry about messing with your aliases ;)

So instead of merely 'evaluating' identifiers, you 'expanded' them, and variables expanded to their values, etc. you could have better aliases, and neater macros. Of course, I've probably misunderstood something horribly, but that's why you guys are here.

-----

1 point by absz 6201 days ago | link

The problem is that macro expansion happens at "compile" time, so local variables don't have their values yet. But you reminded me of something in arc2c: to quote the "NOTES" file, "...if a local variable is ever mutated, a 'sharedvar' C-struct is used to represent that variable. These are constructed by %sharedvar primitives, and are accessed by %sharedvar-write and %sharedvar-read primitives."

So, since I believe almkglor said he was planning to use arc2c for this VM, only changing the code generation step, then the solution seems to be to just use the shared flag on values (probably the reference-counting version, as shader suggested). It seems like since we have sharedvars, the problem solves itself: when you send an object to another thread, you don't send the sharedvar, you send the value the sharedvar is wrapping. Then, when you modify something, then if it's a sent copy, that copy is modified, not affecting the original (because of the shared flag); if it's a local with aliases, then it's modified through the sharedvar, and since the sharedvar is still aliased and only its internals are modified, there's no problem!

Well, maybe---that assumes that I actually understand sharedvars well enough. Almkglor? Stefano (since you suggested/invented them)?

-----

1 point by almkglor 6200 days ago | link

> It seems like since we have sharedvars, the problem solves itself: when you send an object to another thread, you don't send the sharedvar, you send the value the sharedvar is wrapping.

You never actually send sharedvars - they are not accessible at the Arc level. However, they are necessary in order to send functions across processes; the generated code expects to see sharedvars, so it has to have (a copy of) sharedvars.

Edit:

> ...if a local variable is ever mutated, a 'sharedvar' C-struct is used to represent that variable. These are constructed by %sharedvar primitives, and are accessed by %sharedvar-write and %sharedvar-read primitives.

This means this case:

  (let foo 1
    (set foo 42))

Importantly, this does not use sharedvars!

  (let foo "hello"
    (sref foo #\H 0))

-----

2 points by shader 6201 days ago | link

So, is that the only situation versioning is required? Updating local aliases? If so then there should be some work around, or we could just forget about it and consider it a "feature" ;)

Also, I agree about the whole perusal / editing bit. I'm probably even less experienced than absz, but no less interested.

And I again recommend moving to a new thread. The window's getting a bit cramped, and we aren't even talking on the original subject of the post.

Is there a news.yc / forum equivalent for Anarki?

-----

2 points by almkglor 6200 days ago | link

> Is there a news.yc / forum equivalent for Anarki?

You're using it ^^

-----

1 point by almkglor 6200 days ago | link

> Almkglor, is any of the code for your VM available for perusal/editing?

Not yet, the only code I've started writing on is the memory management code. There are two versions, one which uses a mark-and-sweep for local heaps and has shared heaps implemented via boost::shared_ptr, and my newer version which uses stop-and-copy and does (will do?) copy-on-message-pass.

I can send you (and anyone interested) some initial tarballs though ^^. WARNING: C++. With RAII. Be scared.

My solution for the multiple-version thing was to simply have the thread copy all its shared objects and drop all references to shared.

Then I began to feel uncomfortable with it and decided that just copying the darn thing might be better after all ^^

My main reason for using ==> and <== was to make them stand out a bit ^^. -> and <- just look so skinny. Although it's not carved into stone yet.

I already did a mock-up for this in plain Anarki-on-mzscheme, which is the main reason why Anarki suddenly got semaphores and other lock primitives suddenly a month back ^^

-----

2 points by stefano 6208 days ago | link

You should consider targeting the parrot VM: it compiles bytecode to machine code on the fly, gives you access to the OS including threading and networking, it has built-in support for continuations, dynamic typing and other things: http://www.parrotcode.org

-----

3 points by almkglor 6208 days ago | link

Maybe, but does it support separate heaps for each thread? I've heard quite a bit about it but not seen the overall design.

Currently my (unimplemented) plan is, memory is kept in managed heaps, which are handled by a mark-and-sweep collector (can't use semispace or variants for reasons below; at least, I don't think I can use semispace). If a message is passed to another thread, the memory of that message (and every other managed area it refers to) is moved from "managed" space to "shared" space, which is reference counted (the references counted are from threads to shared objects, so circular references are impossible). Objects in "shared" space are immutable (and can thus be accessed lock-free). If a thread attempts to mutate a shared object, instead all shared objects still referenced by that thread are copied to managed space (copy-on-write), the shared objects are released (decrement refcount by 1, free if zero), and the managed copy is mutated (we copy all of them because other shared objects might refer to the shared object being mutated).

Note that "managed" space and "shared" space are not separate areas of memory: they are actually two sets of pointers, so moving an area of memory is just moving a pointer from one set to the other (very cheap). The reason I think I can't use a copy collector is that a shared memory object might end up being located between two managed objects, and the shared object can't be moved without disturbing all the other threads that might be referring to it.

Shared objects are OK to refcount because we're refcounting from threads to shared objects. A thread might have 1000 references to a shared object, but it just counts as 1. Only when the thread GC's and determines that the shared object is not marked does it actually decrement the refcount.

Marks are normally kept in a bool with the memory area, if the memory is managed. If it's shared, then the marking information is kept by membership in a marked_and_shared set while marking (in order to preserve the invariant that shared objects are immutable). Sweeping involves traversing both the managed and the shared set; if the managed set memory area is not marked, then it is freed and removed from the set, etc. If the shared set memory area is not in the marked_and_shared set, then it is released (decrement refcount, free if zero) and removed from the set, etc.

There is also a "light" GC which just sweeps the managed set. Crucially, because shared objects are immutable, they cannot point to managed objects, so mark-and-sweep on only managed objects won't have the overhead of referring to marked_and_shared objects.

(and, err, that's pretty much most of my thinking with regards to having separate heaps being used by various threads. I actually have some of the messier details written down in a text file, anyone interested can post their email addresses ^^)

Personally, I think we need more shared-nothing VM's, not more shared-memory VM's.

-----

2 points by almkglor 6207 days ago | link

Okay, now I've been looking at things, and it seems my solution is actually quite suboptimal.

Basically the main thing I'm trying to solve is the copying of memory of messages when an object is sent across processes. Hence my use of shared immutable data, which is only copied if necessary.

However the problem is really something like this:

  ; process A
  (let v (some-value)
    (==> B v)
    (==> B v))

Conceptually, process B receives two messages that are separate from each other (i.e. two different copies of v). This will allow process B to mutate and do anything to its copy, without worrying that the next message it receives gets mutated also, just because process A happened to send the same data again.

This can actually happen more often than you think. Consider the case where a process serves as an STM-style container:

  (def container (state)
    (while t
      (<==
        ('put obj)
          (= state obj)
        ('query pid)
          (==> pid state))))

Now suppose process A sends an object to process B for storage in the container held by B. Then it queries B, expecting to receive a copy of the object it receives:

   A ==> object ==> B

   A ==> query ==> B

   A <== object <== B

This means that B might send an object which A already has (instead of a copy of the object, as we expected), if we're going to use the "shared" concept I had above.

My solution to this was to check the destination's shared set and see if the message to be sent was in the destination's shared set, and to send a copy the object if it is in the destination's shared set. Unfortunately this checking takes O(N log M) if sets are red-black trees, with N = number of objects in the message and M = number of objects in the destination's shared set, plus an optional additional O(N) to copy if the checking shows that the destination already has a copy of an object to be sent. As opposed to O(N) if we just copy it anyway.

-----

1 point by stefano 6206 days ago | link

Direct (deep-)copy seems a better solution then: it will also let you use a copying garbage collector or anything else without worrying about immutable shared copies. You could even use different GC strategies on a per-thread basis (I don't know how much this would be helpful, though...).

-----

2 points by almkglor 6206 days ago | link

This is true.

The bit that's bothering me about copy collectors is that it's somewhat difficult to handle cleanup/finalization. For example, it would be neat if the VM automatically closed any file handles you happened to have dropped, or to automatically terminate processes that are waiting for a message (without a timeout) and whose pid's have been dropped (so that you can return a pid as a "data structure" with impunity, and still have the process be garbage collected when the data structure loses scope).

I think I've hit upon a possible way of handling this though. A copy collector needs to put up to-pointers anyway; I think it may be possible to have the to-pointers as part of the object. The to-pointers default to NULL. When a heap is released (i.e. after its contents have been copied to another heap), the heap inspects its held objects. Any objects whose to-pointers are NULL are finalized/deleted; in effect, the to-pointers become really expensive mark bits.

> You could even use different GC strategies on a per-thread basis (I don't know how much this would be helpful, though...).

Hey, that's an idea... ^^! Possibly make things generational too, if the heap size grows (I think this is how the Erlang BEAM VM does it)

-----

1 point by stefano 6206 days ago | link

Do you think to-pointers are expensive in terms of space or of runtime(i.e. scanning the entire heap to look for them)? In term of space they aren't really expensive because you can reuse the space allocated by the object: as an example if you move a cons cell (2 words) you can use its first word to hold the new address of the cell. I've implemented a simple copying GC in C, if you want to have a look at it: http://github.com/stefano/tush/tree/master/gc.c

-----

2 points by almkglor 6206 days ago | link

Actually the other problem I'm thinking of is the "size" of an object in the heap under management. I'm currently using C++ for the memory management stuff, and the base type Generic requires a virtual getsize() member function which just returns sizeof(Type); most C++ implementations just allocate a single vtable for all virtuals, and of course I also have recursive marking as virtual functions of Generic. Basically I am overloading the vtable of the objects to also encode the size of individual types.

(the vtable also serves as my "type code", and is probably better thought of that way)

So I can't safely replace a Generic object on the heap with a ToPointer object, because I'd lose access to the size (i.e. I lose the type code, which I'm using to determine the size). This means I can't sweep and finalize afterwards. The alternative is to keep the size separate from the type code/vtable, but then for each object I end up storing a size_t, which on most systems is probably the same size as a Generic* - so I don't save it either.

So basically my options are:

  type code (automagically inserted by C++ as the vtable)
  to-pointer
  data

or:

  size
  type code
  data

Reusing the space of the object is possible, but to differentiate a to-pointer from a "real" object, I need to replace the type-code anyway, so I lose the size information.

Edit: Okay, I've been trying to hack through your posted GC. I don't quite get how the type code is encoded - is the heap split into regions? You seem to be masking the pointer itself to get the type.

-----

1 point by almkglor 6206 days ago | link

Okay, I've figured bits of your GC out after sleeping, it seems that the pointer is a typed pointer. A pointer with a type of extended_tag (?) includes the type with the memory area, and this type is overwritten with a broken_heart type to turn it into a to-pointer.

However I intend to use the same copy-algorithm for message passing, so I can't overwrite the tag/type code and the data just because I've copied it, since the original could be reused.

Or maybe I should actually use a different routine for message passing and GC.

-----

1 point by stefano 6205 days ago | link

> However I intend to use the same copy-algorithm for message passing

In this case, as you've already said, you can't overwrite the orginal type and you have to reserve extra space for the to-pointer (this means one extra word for every allocated object). This way another problem arises: you have to be sure that the to-pointer is clear every time you start a copy this is automatic after a GC, but not after a copy for message passing. This means that after every message passing, the whole heap must be scanned.

> Or maybe I should actually use a different routine for message passing and GC.

I think this is the best solution: passing a message and copying an object during garbage collection are enough different to make them distinct.

-----

1 point by almkglor 6205 days ago | link

Yes, you're right. Looks like I'll end up having the message-passing use a std::map<> instead (which has O(log N) for each lookup, averaging O(N log N) for all memory areas involved in the message), and leave the to-pointers for the semispace destructor to determine which objects need to be cleaned up. Overhead either way.

Or again use a mark and sweep with copying instead of sharing of objects ^^. Well, not much different.

Memory management is hard. Let's go bytecoding!

-----

2 points by stefano 6207 days ago | link

Parrot has its own GC, and I think it has just one heap, so probably it's not what you want.

-----

5 points by applepie 6209 days ago | link

The sad part in Lisp dialects is that most other programming constructs can be faked more or less decently with macros...

-----

3 points by stefano 6208 days ago | link

I would say this is the funny part...

-----

2 points by applepie 6208 days ago | link

This is fun in that one can do a lot with a small core language. But I also think that core languages may need to evolve, and Lisp expresiveness somehow makes that difficult.

E.g. I think we have a lot to learn from Prolog, Icon, Haskell, Dylan, Erlang, and many others.

-----

3 points by almkglor 6208 days ago | link

> E.g. I think we have a lot to learn from Prolog, Icon, Haskell, Dylan, Erlang, and many others.

Such as?

I know we need to have more Erlang-style VM's. In particular I think we need VM's that pretty much enforce shared-nothing by causing all messages to be copied at the VM level. Hence my current project, which will also tie in some code from the AST-based compiler of arc2c.

As for the other languages... what lessons do we need to learn? I know Haskell makes static type checking really really cool, but I'm not certain if we can bash that into a Lisp. Lisp macros can automagically give you lazy evaluation (you just need to insert special syntax to denote lazy evaluation, in much the same way that Haskell requires special syntax for forced evaluation)

-----

8 points by applepie 6208 days ago | link

I'm not necessarily thinking in the current Arc, but in the hypothetical hundred-year Arc.

- Could we make pattern matching through unification a part of the core language?

- Could we make some tasks easier (I mean terser) using Icon or Prolog-like goal-oriented programming?

- Could Haskell typeclasses be adapted to a dynamic language? (I know that is not possible in general, but maybe in the most common cases). CLOS-like oop is the way?

- Easy defstruct as in Haskell is also a win.

- Which would be the best module system? Python's first order modules are cool, but I also Haskell's module system seems better to do ADTs.

- Could we use generators using a lazy list-like interface?

- To which point could the system be reflective? In the past it was considered that "flambda" (user defined, first class special forms) and redefition of eval with "reflective towers" are not that good ideas, because of efficiency. Should we spend more cycles to allow such things?

- Should the read s-expressions also have associated meta-information?

- Could Arc live in an image?

- Could Arc be just the specification of its own compiler, making the bytecode an official part of the language (as pg once said)?

- Could Arc be blazingly fast?

- Could Arc be prepared for multiprocessor architectures like the guys (pun intended ;)) in Fortress want it to be? Or to intensive scalability like Erlang seems to have?

Of course all that questions are implementable. There probably is a library for every one of them in Cliki.

I am just asking which of them we should take and encourage to be seen as part of "Arc's core". Commonality and practice of an idiom, more than its mere possibility, is what actually defines what a language is.

Currently, Arc's core is not much more than sugar for Scheme (modulo defcall, mostly unseen and great). This doesn't want to be insultive, I am happy that people find Arc useful.

Maybe it is that I am looking for a revolution and Arc is Lisp, which is good but not revolutionary.

-----

2 points by almkglor 6207 days ago | link

> - Should the read s-expressions also have associated meta-information?

IMO yes. It might even be possible to have the meta-information attached to the nearest cons cell instead, so a cons cell might have, say, (cons a d (o line-number) (o file-name)). This makes symbols always equal other symbols without having to worry about attached meta-information for each "symbol".

Might be useful to add in my VM then ^^.

> - Could Arc be just the specification of its own compiler, making the bytecode an official part of the language (as pg once said)?

Interesting. Might be useful to define Arc as a set of macros which just expand to a bunch of bytecode.

Anyway I'm thinking my VM should be "bytecode" based. The bytecode won't have a numeric representation, but it will have a symbolic one, like ((localvar 0) (globalvar foo) (plus))

> - Could Arc be prepared for multiprocessor architectures like the guys (pun intended ;)) in Fortress want it to be? Or to intensive scalability like Erlang seems to have?

Well, this is what I want to do ^^.

> - Could we use generators using a lazy list-like interface?

Scanners? I've actually used scanners for something like this in Arki, the wiki in Anarki.

-----

1 point by rincewind 6207 days ago | link

The Problem with unification and logical variables would be that they have to be dynamic in scope and work with a call-by-name evaluation strategy, at least in Prolog.

Anyway, you could build a Prolog-Like DSL with pat-m and amb

  http://arclanguage.org/item?id=2556
  http://arclanguage.org/item?id=6669

While logic Programming in Arc can make problem-solving easier, i think it introduces problems for libraries:

- How should you call functional functions from logical functions?

- How should you call logical functions from functional functions without violating referential transparency?

- Do closures over logical variables make any sense?

- Should goals be limited in scope? Lexical or dynamic? If I call 2 logical functions from different modules I may or may not want the first to be retried when the second fails, depending on the context.

-----

1 point by tung 6208 days ago | link

How do Haskell's modules differ from Python's?

-----

2 points by applepie 6208 days ago | link

Python's modules are first class. For example:

  import module

  x = module
  x.function(...)

In Haskell, names are resolved during compile-time...

Also, in Haskell modules can define which names to export. In Python everything is visible.

-----

3 points by sacado 6208 days ago | link

higher level decisions are pg's part : http://arclanguage.org/item?id=1469

-----

2 points by applepie 6208 days ago | link

Well, this is not a game where we have to follow rules.

This is real life, so if we can make a better language, we don't have to wait pg to come. (And I've lost the hope he does).

-----

2 points by almkglor 6208 days ago | link

Anarki! Anarki! Go go Anarki!

-----

3 points by stefano 6208 days ago | link

Look at the link date...

-----