Arc Forum | stefano's comments

Arc Forum

new | comments | leaders | submit | stefano's comments

2 points by stefano 6374 days ago | link | parent | on: arc2c update

I've been thinking about implementing macros in a compiled lisp, and I've come up with this solution: when you encounter a macro compile it to a temporary file, compile it as a shared library, load it, execute it and compile the returned value. For this approach to work the compiling environment and the compiled environment should use the same internal representation of data structures, so I think that the best way to get macros working is to have a metacircular compiler first, so arc2c would need to be rewritten in the core Arc that it currently support. I hope this helps. As of real coding help I'm under exams, but I hope to be able to help by the end of June.

BTW, on how many projects are you working currently? :)

-----

1 point by almkglor 6373 days ago | link

non-work related ones? 2, arc2c and my "snap" virtual machine.

Not sure about that solution. Certainly it seems that you need access to the macros on the same "level" as the compiler's code. Ouch, my head hurts.

-----

1 point by stefano 6376 days ago | link | parent | on: Why Arc is good for video games

In anarki, you can use ffi.scm to import C functions.

-----

2 points by stefano 6379 days ago | link | parent | on: What's next?

You should consider targeting the parrot VM: it compiles bytecode to machine code on the fly, gives you access to the OS including threading and networking, it has built-in support for continuations, dynamic typing and other things: http://www.parrotcode.org

-----

3 points by almkglor 6379 days ago | link

Maybe, but does it support separate heaps for each thread? I've heard quite a bit about it but not seen the overall design.

Currently my (unimplemented) plan is, memory is kept in managed heaps, which are handled by a mark-and-sweep collector (can't use semispace or variants for reasons below; at least, I don't think I can use semispace). If a message is passed to another thread, the memory of that message (and every other managed area it refers to) is moved from "managed" space to "shared" space, which is reference counted (the references counted are from threads to shared objects, so circular references are impossible). Objects in "shared" space are immutable (and can thus be accessed lock-free). If a thread attempts to mutate a shared object, instead all shared objects still referenced by that thread are copied to managed space (copy-on-write), the shared objects are released (decrement refcount by 1, free if zero), and the managed copy is mutated (we copy all of them because other shared objects might refer to the shared object being mutated).

Note that "managed" space and "shared" space are not separate areas of memory: they are actually two sets of pointers, so moving an area of memory is just moving a pointer from one set to the other (very cheap). The reason I think I can't use a copy collector is that a shared memory object might end up being located between two managed objects, and the shared object can't be moved without disturbing all the other threads that might be referring to it.

Shared objects are OK to refcount because we're refcounting from threads to shared objects. A thread might have 1000 references to a shared object, but it just counts as 1. Only when the thread GC's and determines that the shared object is not marked does it actually decrement the refcount.

Marks are normally kept in a bool with the memory area, if the memory is managed. If it's shared, then the marking information is kept by membership in a marked_and_shared set while marking (in order to preserve the invariant that shared objects are immutable). Sweeping involves traversing both the managed and the shared set; if the managed set memory area is not marked, then it is freed and removed from the set, etc. If the shared set memory area is not in the marked_and_shared set, then it is released (decrement refcount, free if zero) and removed from the set, etc.

There is also a "light" GC which just sweeps the managed set. Crucially, because shared objects are immutable, they cannot point to managed objects, so mark-and-sweep on only managed objects won't have the overhead of referring to marked_and_shared objects.

(and, err, that's pretty much most of my thinking with regards to having separate heaps being used by various threads. I actually have some of the messier details written down in a text file, anyone interested can post their email addresses ^^)

Personally, I think we need more shared-nothing VM's, not more shared-memory VM's.

-----

2 points by almkglor 6377 days ago | link

Okay, now I've been looking at things, and it seems my solution is actually quite suboptimal.

Basically the main thing I'm trying to solve is the copying of memory of messages when an object is sent across processes. Hence my use of shared immutable data, which is only copied if necessary.

However the problem is really something like this:

  ; process A
  (let v (some-value)
    (==> B v)
    (==> B v))

Conceptually, process B receives two messages that are separate from each other (i.e. two different copies of v). This will allow process B to mutate and do anything to its copy, without worrying that the next message it receives gets mutated also, just because process A happened to send the same data again.

This can actually happen more often than you think. Consider the case where a process serves as an STM-style container:

  (def container (state)
    (while t
      (<==
        ('put obj)
          (= state obj)
        ('query pid)
          (==> pid state))))

Now suppose process A sends an object to process B for storage in the container held by B. Then it queries B, expecting to receive a copy of the object it receives:

   A ==> object ==> B

   A ==> query ==> B

   A <== object <== B

This means that B might send an object which A already has (instead of a copy of the object, as we expected), if we're going to use the "shared" concept I had above.

My solution to this was to check the destination's shared set and see if the message to be sent was in the destination's shared set, and to send a copy the object if it is in the destination's shared set. Unfortunately this checking takes O(N log M) if sets are red-black trees, with N = number of objects in the message and M = number of objects in the destination's shared set, plus an optional additional O(N) to copy if the checking shows that the destination already has a copy of an object to be sent. As opposed to O(N) if we just copy it anyway.

-----

1 point by stefano 6377 days ago | link

Direct (deep-)copy seems a better solution then: it will also let you use a copying garbage collector or anything else without worrying about immutable shared copies. You could even use different GC strategies on a per-thread basis (I don't know how much this would be helpful, though...).

-----

2 points by almkglor 6377 days ago | link

This is true.

The bit that's bothering me about copy collectors is that it's somewhat difficult to handle cleanup/finalization. For example, it would be neat if the VM automatically closed any file handles you happened to have dropped, or to automatically terminate processes that are waiting for a message (without a timeout) and whose pid's have been dropped (so that you can return a pid as a "data structure" with impunity, and still have the process be garbage collected when the data structure loses scope).

I think I've hit upon a possible way of handling this though. A copy collector needs to put up to-pointers anyway; I think it may be possible to have the to-pointers as part of the object. The to-pointers default to NULL. When a heap is released (i.e. after its contents have been copied to another heap), the heap inspects its held objects. Any objects whose to-pointers are NULL are finalized/deleted; in effect, the to-pointers become really expensive mark bits.

> You could even use different GC strategies on a per-thread basis (I don't know how much this would be helpful, though...).

Hey, that's an idea... ^^! Possibly make things generational too, if the heap size grows (I think this is how the Erlang BEAM VM does it)

-----

1 point by stefano 6377 days ago | link

Do you think to-pointers are expensive in terms of space or of runtime(i.e. scanning the entire heap to look for them)? In term of space they aren't really expensive because you can reuse the space allocated by the object: as an example if you move a cons cell (2 words) you can use its first word to hold the new address of the cell. I've implemented a simple copying GC in C, if you want to have a look at it: http://github.com/stefano/tush/tree/master/gc.c

-----

2 points by almkglor 6377 days ago | link

Actually the other problem I'm thinking of is the "size" of an object in the heap under management. I'm currently using C++ for the memory management stuff, and the base type Generic requires a virtual getsize() member function which just returns sizeof(Type); most C++ implementations just allocate a single vtable for all virtuals, and of course I also have recursive marking as virtual functions of Generic. Basically I am overloading the vtable of the objects to also encode the size of individual types.

(the vtable also serves as my "type code", and is probably better thought of that way)

So I can't safely replace a Generic object on the heap with a ToPointer object, because I'd lose access to the size (i.e. I lose the type code, which I'm using to determine the size). This means I can't sweep and finalize afterwards. The alternative is to keep the size separate from the type code/vtable, but then for each object I end up storing a size_t, which on most systems is probably the same size as a Generic* - so I don't save it either.

So basically my options are:

  type code (automagically inserted by C++ as the vtable)
  to-pointer
  data

or:

  size
  type code
  data

Reusing the space of the object is possible, but to differentiate a to-pointer from a "real" object, I need to replace the type-code anyway, so I lose the size information.

Edit: Okay, I've been trying to hack through your posted GC. I don't quite get how the type code is encoded - is the heap split into regions? You seem to be masking the pointer itself to get the type.

-----

1 point by almkglor 6376 days ago | link

Okay, I've figured bits of your GC out after sleeping, it seems that the pointer is a typed pointer. A pointer with a type of extended_tag (?) includes the type with the memory area, and this type is overwritten with a broken_heart type to turn it into a to-pointer.

However I intend to use the same copy-algorithm for message passing, so I can't overwrite the tag/type code and the data just because I've copied it, since the original could be reused.

Or maybe I should actually use a different routine for message passing and GC.

-----

1 point by stefano 6376 days ago | link

> However I intend to use the same copy-algorithm for message passing

In this case, as you've already said, you can't overwrite the orginal type and you have to reserve extra space for the to-pointer (this means one extra word for every allocated object). This way another problem arises: you have to be sure that the to-pointer is clear every time you start a copy this is automatic after a GC, but not after a copy for message passing. This means that after every message passing, the whole heap must be scanned.

> Or maybe I should actually use a different routine for message passing and GC.

I think this is the best solution: passing a message and copying an object during garbage collection are enough different to make them distinct.

-----

1 point by almkglor 6376 days ago | link

Yes, you're right. Looks like I'll end up having the message-passing use a std::map<> instead (which has O(log N) for each lookup, averaging O(N log N) for all memory areas involved in the message), and leave the to-pointers for the semispace destructor to determine which objects need to be cleaned up. Overhead either way.

Or again use a mark and sweep with copying instead of sharing of objects ^^. Well, not much different.

Memory management is hard. Let's go bytecoding!

-----

2 points by stefano 6378 days ago | link

Parrot has its own GC, and I think it has just one heap, so probably it's not what you want.

-----

3 points by stefano 6379 days ago | link | parent | on: What's next?

Look at the link date...

-----

3 points by stefano 6379 days ago | link | parent | on: What's next?

I would say this is the funny part...

-----

2 points by applepie 6378 days ago | link

This is fun in that one can do a lot with a small core language. But I also think that core languages may need to evolve, and Lisp expresiveness somehow makes that difficult.

E.g. I think we have a lot to learn from Prolog, Icon, Haskell, Dylan, Erlang, and many others.

-----

3 points by almkglor 6378 days ago | link

> E.g. I think we have a lot to learn from Prolog, Icon, Haskell, Dylan, Erlang, and many others.

Such as?

I know we need to have more Erlang-style VM's. In particular I think we need VM's that pretty much enforce shared-nothing by causing all messages to be copied at the VM level. Hence my current project, which will also tie in some code from the AST-based compiler of arc2c.

As for the other languages... what lessons do we need to learn? I know Haskell makes static type checking really really cool, but I'm not certain if we can bash that into a Lisp. Lisp macros can automagically give you lazy evaluation (you just need to insert special syntax to denote lazy evaluation, in much the same way that Haskell requires special syntax for forced evaluation)

-----

8 points by applepie 6378 days ago | link

I'm not necessarily thinking in the current Arc, but in the hypothetical hundred-year Arc.

- Could we make pattern matching through unification a part of the core language?

- Could we make some tasks easier (I mean terser) using Icon or Prolog-like goal-oriented programming?

- Could Haskell typeclasses be adapted to a dynamic language? (I know that is not possible in general, but maybe in the most common cases). CLOS-like oop is the way?

- Easy defstruct as in Haskell is also a win.

- Which would be the best module system? Python's first order modules are cool, but I also Haskell's module system seems better to do ADTs.

- Could we use generators using a lazy list-like interface?

- To which point could the system be reflective? In the past it was considered that "flambda" (user defined, first class special forms) and redefition of eval with "reflective towers" are not that good ideas, because of efficiency. Should we spend more cycles to allow such things?

- Should the read s-expressions also have associated meta-information?

- Could Arc live in an image?

- Could Arc be just the specification of its own compiler, making the bytecode an official part of the language (as pg once said)?

- Could Arc be blazingly fast?

- Could Arc be prepared for multiprocessor architectures like the guys (pun intended ;)) in Fortress want it to be? Or to intensive scalability like Erlang seems to have?

Of course all that questions are implementable. There probably is a library for every one of them in Cliki.

I am just asking which of them we should take and encourage to be seen as part of "Arc's core". Commonality and practice of an idiom, more than its mere possibility, is what actually defines what a language is.

Currently, Arc's core is not much more than sugar for Scheme (modulo defcall, mostly unseen and great). This doesn't want to be insultive, I am happy that people find Arc useful.

Maybe it is that I am looking for a revolution and Arc is Lisp, which is good but not revolutionary.

-----

2 points by almkglor 6378 days ago | link

> - Should the read s-expressions also have associated meta-information?

IMO yes. It might even be possible to have the meta-information attached to the nearest cons cell instead, so a cons cell might have, say, (cons a d (o line-number) (o file-name)). This makes symbols always equal other symbols without having to worry about attached meta-information for each "symbol".

Might be useful to add in my VM then ^^.

> - Could Arc be just the specification of its own compiler, making the bytecode an official part of the language (as pg once said)?

Interesting. Might be useful to define Arc as a set of macros which just expand to a bunch of bytecode.

Anyway I'm thinking my VM should be "bytecode" based. The bytecode won't have a numeric representation, but it will have a symbolic one, like ((localvar 0) (globalvar foo) (plus))

> - Could Arc be prepared for multiprocessor architectures like the guys (pun intended ;)) in Fortress want it to be? Or to intensive scalability like Erlang seems to have?

Well, this is what I want to do ^^.

> - Could we use generators using a lazy list-like interface?

Scanners? I've actually used scanners for something like this in Arki, the wiki in Anarki.

-----

1 point by rincewind 6378 days ago | link

The Problem with unification and logical variables would be that they have to be dynamic in scope and work with a call-by-name evaluation strategy, at least in Prolog.

Anyway, you could build a Prolog-Like DSL with pat-m and amb

  http://arclanguage.org/item?id=2556
  http://arclanguage.org/item?id=6669

While logic Programming in Arc can make problem-solving easier, i think it introduces problems for libraries:

- How should you call functional functions from logical functions?

- How should you call logical functions from functional functions without violating referential transparency?

- Do closures over logical variables make any sense?

- Should goals be limited in scope? Lexical or dynamic? If I call 2 logical functions from different modules I may or may not want the first to be retried when the second fails, depending on the context.

-----

1 point by tung 6378 days ago | link

How do Haskell's modules differ from Python's?

-----

2 points by applepie 6378 days ago | link

Python's modules are first class. For example:

  import module

  x = module
  x.function(...)

In Haskell, names are resolved during compile-time...

Also, in Haskell modules can define which names to export. In Python everything is visible.

-----

2 points by stefano 6391 days ago | link | parent | on: Don't understand how macros work

You don't really need to use eval:

  (mac assign (name expr)
    `(= ,name ,expr))

then (assign k 3) will be expanded to (= k 3).

-----

2 points by zhtw 6391 days ago | link

No. You didn't understand. I meant exactly what I wrote. I want to assign a value to variable which name is stored in the variable "name".

-----

1 point by stefano 6391 days ago | link | parent | on: Chalk one up for Arc

Could you push it on Anarki?

-----

2 points by Johnny_65701 6391 days ago | link

It's for Common Lisp, not Arc, but if you want to play with it, send me an e-mail: jjuniman at ll dot mit dot edu. I tried to post it but the formatting got all mangled up and it became unreadable. It's not a lot of code, only a dozen lines or so.

-----

2 points by absz 6391 days ago | link

To post code that looks like code, surround it by blank lines and indent each line with two spaces—that should allow the code to display in a sane manner.

  (When you do that,
   it comes out like this.)

-----

2 points by Johnny_65701 6390 days ago | link

I did that and the ends of the lines got all mutilated. Dunno why, I'm copying & pasting from emacs, I'm not doing anything exotic.

-----

2 points by absz 6390 days ago | link

Are you on Windows/editing files created there? If so, is it possible that Windows's "\r\n" line endings are being pasted in literally?

-----

3 points by Johnny_65701 6389 days ago | link

Nope, Linux. But let me try again, since I have a different macro now that's considerably simpler, so the loss of newlines isn't as bad.

The original one I wrote wasn't working quite right. Then I found something in On Lisp, a macro called with-gensyms that I renamed with-hygeine due to a symbol conflict with something in Clisp:

  (defmacro with-hygiene (syms &body body)
   `(let ,(mapcar #'(lambda (s)
                    `(,s (gensym)))
                syms)
    ,@body))

You're supposed to use it like this:

  (defmacro my-stupid-macro (x y z)
    (with-hygeine (a b c)
      `(do-something-with ,x ,y ,z ,a ,b ,c)))

This gets me most of the way there, but what I really want is this:

  (hyg my-stupid-macro (x y z) (a b c)
    `(do-something-with ,x ,y ,z ,a ,b ,c))

We can achieve that like this:

  (defmacro hyg (name args syms &body body)
    `(defmacro ,name ,args
       (with-hygiene ,syms ,@body)))

Now I can do this:

  (hyg swap (a b) (temp) 
    `(progn 
      (setf ,temp ,a) 
      (setf ,a ,b) 
      (setf ,b ,temp)))

Which expands to:

  (DEFMACRO SWAP (A B) 
    (WITH-HYGIENE (TEMP) 
     `(PROGN 
        (SETF ,TEMP ,A) 
        (SETF ,A ,B) 
        (SETF ,B ,TEMP))))

which expands to:

  (PROGN 
    (SETF #:G3089 A) 
     (SETF A B) 
     (SETF B #:G3089))

PS: I'm a Lisp noob, so apologies in advance if something about this is a little naive, or contains a major oversight.

-----

2 points by absz 6389 days ago | link

Are you sure that you're indenting the lines with two extra spaces each (even the first)? That might fix the formatting.

-----

2 points by Johnny_65701 6389 days ago | link

Ooops, you are correct, sir. I indented all the lines except the first. Indenting the first line fixed the formatting (I went back & edited it)

-----

5 points by stefano 6393 days ago | link | parent | on: Chalk one up for Arc

Cohesion is the real big problem of the Lisp community in general, and it should be addressed by Arc. In this respect I think that Anarki is what the official version of Arc should look like: one place were everyone can contribute.

-----

3 points by almkglor 6392 days ago | link

LOL. Let us therefore officially decree that Anarki is the official Arc. ^^

-----

3 points by stefano 6392 days ago | link

That's right! :)

-----

3 points by almkglor 6392 days ago | link

LOL. And therefore ArcN is the Decreed Unofficial Canonical Arc.

-----

2 points by stefano 6396 days ago | link | parent | on: rainbow update

I'd really like a Java interface!

-----

2 points by sacado 6395 days ago | link

Yep. A java interface would give access to so many libraries that it would solve a lot of problems as for arc's lack of popularity...

-----

2 points by conanite 6393 days ago | link

working on it. Are there any libraries you're thinking of using already?

-----

2 points by stefano 6393 days ago | link

Being able to access graphic libraries would be great.

-----

5 points by stefano 6398 days ago | link | parent | on: Why Arc is bad for exploratory programming

You're right: Arc is just too young to have a huge (or even a medium sized) collection of libraries, because it takes a lot of time to write libraries to fetch web pages (BTW, look at http-get on Anarki), manipulate images and so on. We should talk about Arc in terms of what it will be able to become, not in terms of what it is today. In this view, the only real problem is the lack of a module system to support the construction of libraries.

-----