Arc Forum | stefano's comments

Arc Forum

new | comments | leaders | submit | stefano's comments

1 point by stefano 6420 days ago | link | parent | on: arc2c update

> Btw, doctstrings are a real performance killer : they are useless, but are allocated on the heap and fill it up really quick (ah, recursive functions...

Are you saying that you alloc a docstring at every function call?

-----

1 point by sacado 6420 days ago | link

Well, for the moment, yes. Every object appearing in the program has to be allocated (it's not an optimizing compiler yet). Useless objects are not detected, so every time the compiler sees a string, it generates code to allocate it, and it is freed on the next GC cycle. Every time, you call the function, that code is executed. Well, that's an easy optimisation, so I'll work on it very soon I guess.

-----

1 point by stefano 6420 days ago | link

Yes, it's not difficult. You just have to find all constant values, create a global var for each constant, assign the const value to the global var and substitute the occurence of the constant with the global var name.

-----

1 point by stefano 6423 days ago | link | parent | on: arc2c update

You'll have to ask sacado to add you to the list of committers.

-----

4 points by stefano 6426 days ago | link | parent | on: Tables with lists as keys behave inconsistent

Const lists (created with quote) are improper scheme list, i.e. they end with nil instead of the emtpy list (in scheme nil != '()), whereas lists created with (list ...) are proper lists (ending with '()). This is certainly a bug.

-----

2 points by stefano 6427 days ago | link | parent | on: fork() in Arc

On Anarki, you could try using:

  (require "ffi.arc")
  
  (w/ffi "libc.so"
    (cdef fork "fork" cint ()))

Now you should have imported fork.

-----

2 points by eds 6426 days ago | link

Thanks for your suggestion.

I do have one question about loading libc. Since the system doesn't automatically find libc.so for you, you need to name the path explicitly. What worked for me was

  (w/ffi "/lib/libc.so.6"
    (cdef fork "fork" cint ()))

but I don't really know if this is the right way to do it. Is there a better way?

-----

2 points by elibarzilay 6426 days ago | link

The underlying foreign implementation is looking for the library. The problem is that /usr/lib/libc.so is not the library and dlopen() does not know how to handle this.

A better alternative with mzscheme's foreign interface is to use `#f' for the library -- that treats the mzscheme binary as the library to open (which includes the usual libc stuff.)

-----

1 point by sacado 6426 days ago | link

Interesting. Should be added to ffi.arc...

-----

2 points by eds 6426 days ago | link

Yeah, because currently if you attempt

  (w/ffi #f
    (cdef cfork "fork" cint ()))

arc will convert #f to nil, which mzcheme thinks is an unbound variable. There are work-arounds, like

  (w/ffi (read "#f")
    (cdef cfork "fork" cint ()))

but this is rather ugly.

-----

1 point by eds 6426 days ago | link

Also, I think I need to use wait() to wait for a child process to terminate, but I am somewhat at a loss as to how to accomplish this, since the manual says it takes an int pointer... Again, any help would be appreciate.

-----

2 points by stefano 6426 days ago | link

Try:

  (cdef _wait "wait" cint (cptr))

  (def wait (i)
    (let pi (cmalloc 4)
      (cpset pi cint i)
      (_wait pi)))

This creates a pointer, assign it a value and returns the value returned by _wait.

-----

1 point by eds 6421 days ago | link

Also, in your malloc() example, don't you have to explicitly free the memory afterwards? MzScheme's GC doesn't deal with memory explicitly allocated from C, does it? (I know the example itself uses only a tiny amount of memory, but still...)

-----

1 point by sacado 6421 days ago | link

Yes and no. I think mzscheme's GC can handle it without explicitly calling free at the right time, but you have to register a "finalizer" function to do so. That function will be called when the GC collects the object.

It does not know the size of malloced objects however, so be careful (no pb there, but if you allocate something very big, mzscheme will only see a new reference and will not necessarily call GC when you will actually lack memory).

-----

3 points by stefano 6421 days ago | link

If you alloc memory with cmalloc mzscheme will automatically delete it for you. Try

  (def infinite ()
    (cmalloc 4)
    (infinite))

  (infinite)

You shouldn't run out of memory. If you do, then I'm wrong.

-----

1 point by eds 6424 days ago | link

Just wondering: is there a way to malloc(sizeof(int)) from inside Arc instead of hard coding the size of an int?

-----

2 points by stefano 6424 days ago | link

You can make a C function like this:

  int size_of_int ()
  {
    return sizeof(int);
  }

and import it. I've followed this route while writing gtk.arc, but probably there's a better way to do it.

-----

3 points by stefano 6430 days ago | link | parent | on: Hash tables are unnecessary

1. Hash tables gives you O(1) access with lesser constant times than red-black-trees, that coud need a few rotations after every insert/remove.

2. It's better to use cons cells for list construction rather than vectors, because vectors have got an extra cell to store the length of the vector itself. On 32 bit machines a cons cell takes 8 bytes, a vector of 2 elements takes 12 byte i.e. 50% more than the cons cell.

-----

4 points by almkglor 6429 days ago | link

2) In theory, the length of the vector could be stored by the memory manager anyway (possibly as a start and end address), which after all has to know how big to deallocate when the vector is GC'ed. In fact if the metadata is interspersed with the allocated memory it probably will be just a length, since the current address is known.

-----

2 points by stefano 6432 days ago | link | parent | on: arc2c update

Macros should be easier to implement once the compiler is able to compile itself, because this way the compiler and the compiled macro have the same internal representation of data structures, so passing arguments between the two shouldn't be too hard.

-----

3 points by almkglor 6432 days ago | link

> once the compiler is able to compile itself

There are several uses of macros in the compiler, unfortunately. In particular the 'def macro is too much of a convenience. So in order for the compiler to easily compile itself, it first has to implement macros. Chicken, meet egg.

Ah heck, maybe I should just use 'eval now and implement a compiled 'eval interpreter later that can interpret code and yet allow interpreted code to call compiled code and vice versa.

In fact I already have a bit of a sketch for this (which is necessary if we want to allow compiled programs to use 'eval). Basically put interpreted '(fn ...) forms into a 'interpreted-fn annotated type together with surrounding environment, add an entry to the 'calls* table (via defcall, say) for 'interpreted-fn to, say, a $$interpreted-fn-apply function which binds the parameters into an environment table and calls the 'eval interpreter.

Of course this requires some changes in the base system: we need at the very least a %symeval primitive which when given a symbol will give its global binding, a %symset primitive which will modify a symbol's global binding, and obviously we need a link from the symbol to the GLOBAL() array (and dynamically create new containers for created symbols - if it's not in the GLOBAL() array then the compiled code would never read that global anyway, only the interpreted code ever will).

The rest of the interpreter is just a standard scheme interpreter, the only real support we need is to be able to call compiled-from-interpreted and interpreted-from-compiled, and the reading and binding of global symbols, including those that aren't in the GLOBAL table.

Ouch, my head hurts. And sacado's the one doing the Unicode strings. LOL

-----

5 points by kens 6432 days ago | link

Would it be worth implementing 'def directly? This would give a lot more functionality right away. This could be temporary until macros are implemented.

-----

1 point by almkglor 6431 days ago | link

Possibly. There's a bunch of "macro" transformations in xe.arc, possibly I'm just a bit too lazy to think. However I don't like depending on those transforms, I want to do it "properly"

-----

1 point by sacado 6431 days ago | link

I think that's what I'm going to do, until macros are implemented : make 'def a special form, automatically transformed into (set foo (fn...

-----

1 point by stefano 6431 days ago | link

For the global vars problem, a solution could be to associate top level values directly with the symbol, this way a symbol would consist of three values: its string representation, its global value (initially a special 'unbound value') and a property list.

-----

1 point by almkglor 6430 days ago | link

The current style has an optimization where all globals are simply referenced directly from an array in O(1). I'd rather that symbols point to entries in this array, because symbol-as-global-variable lookups are expected to be completely nonexistent if 'eval isn't involved in the program anyway (who uses 'eval in a language with 'read?). Only newly created symbols must have allocated variable values, and only for the benefit of 'eval'ed code - we can already know the global variables in the compiled code, because the compiler need that info anyway.

Basically:

  struct {
    long type; /*T_SYM*/
    char* stringform;
  #ifdef EVAL_USED
    obj* binding;
  #endif
  } symbol;

   int main(){
     /*compiler generated only if eval is used*/
     obj sym; symbol* sympt;
     sym = SYM2OBJ("globalvar0");
     sympt = (symbol*) sym;
     sympt->binding = &GLOBAL(0);
     sym = SYM2OBJ("globalvar1");
     sympt = (symbol*) sym;
     sympt->binding = &GLOBAL(1);
     ...
   }

This way the current performance is retained (global variable lookups are O(1)).

-----

2 points by stefano 6430 days ago | link

I don't know how much this solution will be once support for a dynamic load (e.g. from the REPL) will have to be implemented, because you'll have to keep an index of the last global variables created across different compilation sessions. With threads it gets even more complicated (mutex on the index?). With symbols it would be simpler to implement a dynamic load or definition of a global var from the REPL. The price paid is a slightly slower access to global variables, because 2 references to memory are necessary for every refrence to a global var. Global variables lookups are still O(1) though, e.g: sym->binding for read access and sym->binding = value for write access.

-----

2 points by almkglor 6430 days ago | link

> the last global variables created across different compilation sessions

I don't understand this part. I was proposing that 'eval would be an interpreter, not a compiler. My intentions was that compiled code would be statically generated (the way it's done now), so 'eval cannot possibly compile code. It would be a compiled interpreter of Arc. arc2c is a static compiler, so 'eval won't add ever add compiled code; the best it can do is create a 'interpreted-fn object that contains an interpreted function's code (as a list) and the enclosing interpreted environment

So a dynamic load would just interpret the expressions in the file being loaded:

  (set load
    (fn (f)
      (w/infile s f
        (whilet e (read s)
          (eval e)))))

'eval would be able to access the global variable table indirectly via the symbols and %symeval/%symset.

Basically, 'eval would be compiled to something like this:

  (set eval
    (fn (e (o env nil))
      (if (isa e 'symbol)
          (if env (lookup-environment env e)
                  (%symeval e))
          (...))))

Also: if the compiled code doesn't reference it, it won't be in the GLOBAL() array. The reason is simple: the compiled code won't reference it, ever. If 'globalvar isn't in GLOBAL(), then it does not exist in the compiled code. So it doesn't matter that it's not in the GLOBAL() array - the compiled code never referenced that global, so it won't ever use an index into the GLOBAL() array to refer to it. The interpreted code might, but that's why we have an indirect reference connected to the symeval.

Also, when I say O(1), I mean O(1) with the number one, as in only one layer of indirection (an index within a table). If global bindings are kept with the symbol only, then all global accesses - even precompiled ones - need (1) to find the symbol and (2) get the binding, for a total of O(2).

In other words: 'compile-file compiles, but it creates a new executable which is never connected to the process that ran 'compile-file. 'eval just interprets, and if the interpreted code mutates a global of the program, then that global gets mutated for real, in the program (what are you doing using 'eval on untrusted coe anyway). But if the interpreted code mutates a global that is never used in the program, it just creates a new global variable, one which is never referenced by the program (by definition, because the program never used it).

-----

1 point by stefano 6430 days ago | link

I thought eval compiled code, loaded it and then executed it. I've been mistaken. With the compiled code completely static then your strategy is better than assigning values to symbols.

-----

1 point by stefano 6434 days ago | link | parent | on: arc2c update

I'd like to have access too. Currently I don't have much free time to contribute, but if I find some time i'll be glad to contribute. My github username is, surprisingly, 'stefano'.

-----

1 point by sacado 6434 days ago | link

You're on the list now :)

-----

1 point by stefano 6433 days ago | link

Thanks :)

-----

2 points by stefano 6434 days ago | link | parent | on: arc2c update

A non-optimizing compiler leads easily to a "fast enough" executable. Without optimizations I think the compiled code would be 7x~10x slower than C.

Edit: I've tried the Fibonacci "benchmark" on a simple compiler i'm writing: it takes 0.2 seconds to compile the program and to compute the 32snd Fibonacci's number. On the current Arc interpreter it takes ~5 seconds.

-----

3 points by binx 6434 days ago | link

Your compiler might be much slower if it's with true scheme numbers, + operator as a function(not a primitive operator) and stack overflow checking. These features are currently supported by the arc interpreter on mzscheme.

If you can correctly eliminate function calls on +, your compiler is an optimizing one, not non-optimizing...

-----

2 points by stefano 6433 days ago | link

I've tried the same example putting a function call and a test around every arithmetic operation, and execution time went from ~0.2s to ~0.26s, not a big difference, although a few optimization will probably be necessary for something more complex than fibonacci's example.

-----

2 points by binx 6433 days ago | link

Is the function call overhead so small? I didn't realize.^^

But there are other issues: the fib example is not a very good benchmark suit, because in C, general recursion is not a common paradigm. If we compare C loops to Arc tail recursive calls generated by a simple compiler instead of comparing C recursions to Arc recursions, I believe that the difference will be much larger. Because C compiler writers have spent at least 20 years on optimizing loops...

-----

2 points by stefano 6433 days ago | link

That's absolutely true. Reaching C speed with high level languages such as Lisp it's very very difficult. CMUCL and SBCL reach roughly the speed of C, but they've been developed for a long time. As of loops speed vs. tail recursion speed, the difference shouldn't be too big.

-----

2 points by binx 6433 days ago | link

Stalin performs as better as C in numerical programs and many other benchmarks. The most exciting thing is that unlike CL, stalin doesn't need type declarations to guide optimizations. It would infer as much type information as possible. The problems is that it compiles too slow and it's not maintained anymore.

Naively implemented tail recursions is still not fast, because many common loop optimizations can't be directly applied to them unless you eliminate the function calls and regard them as true goto's. It's a rough task because the global flow analysis is needed for eliminating as many calls as we can.

-----

3 points by stefano 6436 days ago | link | parent | on: arc2c: a proposed threading model

This strategy seems viable, but it seems slow, because there are a lot of checks at every thread switching. If we could manage to transform every Arc function into a C function (actually an array containing the C function address and the closed vars) we could then rely on the pthread library for multi-threading, or the equivalent threading library available on the host system.

-----

3 points by almkglor 6436 days ago | link

The problem with transforming each Arc function to a C function is the stack space consumed at each call. Sure, gcc does tail call optimization, but not all compilers are gcc.

Chicken fixes this by keeping track of stack and GC'ing the stack when it's full; I'm not sure about Bigloo but I hear it's got a pretty good Scheme function == C function equivalency, so it might be useful to see how they fixed the tail call optimization problem there.

That said, the current execute() function accepts a pc argument, which specifies which Arc function it begins with. It may be possible to pass the pc to go to together with a new stack, but I don't know pthreads.

-----

2 points by stefano 6436 days ago | link

I think you could trust every decent C compiler to do tail recursion optimization, but if you want full compatibility then the mapping Arc fun -> C fun doesn't work. pthread requires you to pass a pointer to a C function i.e. an adress where to jump. We could wrap every thread within a C function and leave all the other functions as they are currently implemented. The C function would just call execute() with the right parameters. If execute() doesn't use global vars, there will not be race condition problems.

-----

2 points by almkglor 6436 days ago | link

Well, I've started to import bits of execute from globals to locals. However I do have access to a few bits of global variables, specifically the quoted-constants array (those created by 'foo and '(a b c d), etc.); this table is initialized at startup (before the first call to execute). I would suppose this read-only array would be okay to access?

As for wrapping them in C functions: the problem is that the most basic Arc threading function isn't 'thread, it's 'new-thread, which accepts a function as input. 'thread is defined as:

  (mac thread body
    `(new-thread (fn () ,@body)))

In theory, new-thread could be called with any function:

  (new-thread
   (if
    (something)
      (fn () (ha-ha-ha))
    (something-else)
      (fn () (he-he-he))
      (fn () (je-je-je))))

Sure, it won't happen most of the time, since most people will sensibly use the simpler 'thread, but exploratory, exploratory...

It would be possible to implement if pthreads or whatnot can pass even just a single pointer to the newly-threaded function, but if it can't (why not?) then our alternative is to create a bunch of C functions for each Arc function, which just calls the execute() function with the correct number.

---------

Edit: okay, I did a little eensy-weensy bit of research on pthreads, and it seems that pthreads can pass a pointer to the called C function.

So I suppose we can modify execute's signature:

  obj execute(long pc, obj stack[], obj *sp);
  int main(void){
    /*init...*/
    obj *stack = GC_MALLOC(sizeof(obj)*MAX_STACK);
    obj *sp = stack;
    execute(0, stack, sp);
  }

Then the NEWTHREAD() primitive would be:

  typedef struct{
    long pc;
    obj *stack;
    obj *sp;
  } newthreadpasser;
  #define T_THREAD 9999 //some random number
  typedef struct{
    long type; /*T_THREAD*/
    pthread_t val;
  } thread;
  #define NEWTHREAD() \
   {obj *newstack = GC_MALLOC(sizeof(obj) * MAX_STACK);\
    obj *newsp = &newstack[2]; \
    newstack[0] = POP(); \
    newstack[1] = THREAD_EXIT_CONTINUATION;\
    newthreadpasser * np = GC_MALLOC(sizeof(newthreadpasser));\
    np->pc = ((obj*)newstack[0])[0];\
    np->stack = newstack;\
    np->sp = newsp;\
    thread * threadobj = GC_MALLOC(sizeof(thread));\
    threadobj->type = T_THREAD;\
    pthread_create(&(threadobj->val), NULL, threadcall, (void*) np)\
    PUSH((obj)threadobj);}

Finally, threadcall:

  void *threadcall(void * args){
    newthreadpasser * np = (newthreadpasser *) args;
    execute(np->pc, np->stack, np->sp);
  }

Hmm.

-----

2 points by stefano 6435 days ago | link

This could work. As for global variables, if they are read-only, then there won't be any problem. What happens if you load in parallel two different modules (with their constants)? Maybe loading a file should be made atomic, or at least a part of it, such as constant values initialization.

-----

1 point by almkglor 6435 days ago | link

Well, a "load" at top level simply inserts the code for the loaded file directly into the source for now. Any other load statement is not supported.

-----

1 point by kranerup 6433 days ago | link

Using pthreads might have some benefits but speed is not a reason for choosing pthreads. On Linux systems that maps pthreads onto kernel threads I've seen a two order of magnitude slower execution compared to an user space thread library. In general I'd say that if I/O is important then using pthreads would make sense but if speed is important then green threads is preferable.

-----

2 points by stefano 6433 days ago | link

I thought pthreads were fast, but anyway on computers with more than one core/processor, which are very common, green threads don't distribute the threads between the cores, because this is the OS's duty, and it doesn't know anything about green threads.

-----

3 points by almkglor 6433 days ago | link

Hmm. After a bit of research, it seems that some VM's handle this by using pthreads a few times (presumably one per core), then having each pthread spawn several green threads.

-----

2 points by stefano 6433 days ago | link

This seems a good solution if we can arrange to spawn a new pthread also for blocking I/O, maybe by catching I/O functions and putting into a pthread the I/O operation caller. This hybrid would make the compiler code not very clean, because it should handle both green threads and pthreads.

-----

4 points by stefano 6436 days ago | link | parent | on: arc2c : new version, very soon on the git

You could try transforming

(fn (x) (list (fn () x) (fn (v) (set x v))))

  (fn (x)
    (let x (cons x nil)
      (list
        (fn () (car x))
        (fn (v) (set-car x v)))))

This way the adress of the cons cell x isn't shared, but the cell itself is shared. The compiler would do this transformation automatically.

-----

3 points by almkglor 6436 days ago | link

Done and on the git.

Some notes:

1) I use a new structure, the sharedvar, which doesn't correspond to any Arc structure. See the other post which had me blinking stupidly at stefano's transformation before I realized how cool it was: http://arclanguage.com/item?id=5784

2) This new structure is untyped (i.e doesn't have type tags, unlike the pair and symbol structures). I intend to make shared variables "seamless" to the upper Arc though, so this should be fine...

3) My original name for this structure was closure-settable, which I shortened to sharedvar instead.

PS f34r my new gravatar

-----

1 point by stefano 6436 days ago | link

Wow. You implemented it very quickly.

Making the new structure untyped shouldn't create problems, but it could make development difficult: if you have a bug in the transformer or in something related to it, you'll probably get a Seg. fault error instead of a clean "Not a sharedvar" error. The check could then be removed when the code is released.

-----

1 point by almkglor 6436 days ago | link

Hmm. So far the transformation seems correct anyway; and really, the transform is quite simple and appears mathematically correct. Also, because of the way the compile-file driver is structured, it would be possible to remove/replace each individual transform.

Which reminds me, we need to have a proper error continuation too.

-----

2 points by almkglor 6436 days ago | link

Huh.

blinks stupidly

Cool.

Why didn't I....

Cool.

Blink

Cool.

Edit: Okay, I managed to snap out of shock.

The base idea is pretty good, although the 'cons cell (which is composed of a type id, a car pointer, and a cdr pointer) can be replaced with a smaller "closure-settable" cell, which would be an untyped object composed of a single pointer:

  (fn (x)
    ; % means it's a primitive
    (let x (%closure-settable x)
      (list
        (fn () (%closure-settable-read x))
        (fn (v) (%closure-settable-write x v)))))

Because the closure variable abstraction is not accessible from the high-level Arc language, the closure-settable doesn't need to be typed tagged

-----