Arc Forumnew | comments | leaders | submitlogin
Will Arc ever be as fast as CL?
8 points by pgwoden 5886 days ago | 35 comments
When the topic of Arc's execution speed comes up in this forum, it is generally in comparison to Scheme or Python, both of which are fast enough for most web apps. Since a major goal of Arc is to be a better tool for web development, should I assume that "fast enough for the web" is likely to be Arc's ultimate speed limit, or can I hope that someday it might become fast enough for hard-core number crunching, the way that CL is?


4 points by nlavine 5885 days ago | link

I would argue that speed is a property of programs and compilers rather than languages. For instance, this Arc program to compute the factorial of n:

  (def factorial (n)
    ((afn (a r)
       (if (= r 1) r (self (* a r) (- r 1))))
     1 n))
is very similar to this C program:

  long factorial(long n)
  {
    long a;
      while (n != 1) {
        a *= n;
        n--;
      }
  }
The only real difference is Arc's arbitrarily-sized integers, which you could probably turn off with a flag (or, better yet, implement in C - I think there's a GNU library for this).

It should be possible to write a fairly simple scanner for Arc that looks at an Arc program and decides whether it is trivially convertible to an equivalent C program. If it is, the Arc program could be converted. If, as I imagine, the mapping would let you write any C program as an Arc program, you could then argue that Arc was as fast as C.

The question would be, what about the features of Arc that aren't trivially convertible to C, like ccc? One response would be, "well, ccc is a pretty hard function to implement in C, but that's just a property of what ccc does - to implement the equivalent functionality, you just need this much complexity. A C programmer who wanted to use continuations would have to write the same function, so the slowness is not really because of Arc."

In other words, Arc gives you easy access to complex and hard-to-implement functions. This doesn't mean Arc is slow, just that Arc is powerful. One line of Arc code is probably much slower, on average, than one line of C code, but we don't really mind this, because the Arc code is referencing algorithms that, if they were implemented in C, would be just as slow as the Arc implementations. I think that is the true definition of speed, and the only fair one to hold Arc to. It implies, however, that speed will always be a property of individual programs and implementations, and not of languages.

-----

4 points by sramsay 5884 days ago | link

You're absolutely right: speed is a property of programs/compilers and not languages. And because of this, there's really no reason why we couldn't have an implementation of Arc that compiles to C (using, perhaps, Chicken Scheme's astonishing method), one that targets the JVM, one that's designed to be embedded, and so forth.

But this is really a socio-political decision, is it not? Do we (either PG or this burgeoning community of fans) prefer a benevolent dictatorship of the sort that governs languages like Perl and Ruby, or do we prefer the ramified computational episteme of modern Scheme?

There are good arguments on both sides, but I really think that letting a thousand flowers bloom (as the Scheme community has done) has great advantages. We get to use Scheme in lots of varied environments, with lots of different hardware, and with implementations optimized for lots of specialized tasks. This abundance is in part facilitated by Scheme's minimalistic standard, of course, and that has its own drawbacks -- code portability being the most serious one.

Personally, I'm not sure I'm down with the idea of a lisp optimized for "web applications" or "exploratory programming." It seems to me that the strength of Lisp lies precisely in its ability to become a language for "x programming." Ironically, PG himself has made some of the most eloquent arguments out there for this idea.

It seems to me that implementors are the ones who should take Arc and turn it into a "language for hacking cell phones" or whatever. Some domains will put a premium on speed, others on "smallness," still others on "embededness" or what have you. Nothing in the language should foreclose these options, but as you said, it's not clear that languages ever do. In the case of Lisp, we could write a compiler in which every function and macro is converted directly into highly optimized assembly. I don't think there's anything in Arc or any other language that would prevent that.

-----

3 points by pgwoden 5884 days ago | link

As I have no doubt that it is possible to create an Arc implementation that delivers fast execution, it is precisely the socio-political issue that I intended to address in opening this thread.

As long as Arc is implemented in MzScheme, the best it can do is asymptotically approach MzScheme's performance. The discussion in this thread suggests that Arc will not forever be implemented in this way, so the limitation will be lifted. Just how much interest there is in greased-lightning performance is not clear to me.

-----

7 points by sramsay 5884 days ago | link

Well, speaking only for myself . . .

I'm an English professor who does various kinds of computational analysis on large text corpora (so things like string handling, XML, and regex are really important to me). I've been known to write programs that take three weeks to run in Java, so I'm always looking for ways to make my programs fast without resorting to C. Nothing against C. It's one of my favorite languages. It's just not a lot of fun for string processing.

Basically, I always want to go high and fast with my languages, and that's one of the reasons I like Lisp. It's a super high level language, but (in my usage patterns) it outperforms languages like Ruby (which I adore) and Java (which I find increasingly annoying).

Now, my particular usage is perhaps a bit obscure, but it may generalize to other areas. I can't believe I'm the only one doing lots of text processing who wants a fast, high level language. In the end "web application programming" is really just a special case of text processing, so it may align with PG's goals at some more fundamental level.

-----

1 point by sacado 5884 days ago | link

I want a dictatorship ! :)

-----

5 points by lojic 5886 days ago | link

Welcome to the Arc forum.

It's premature to say what type of performance Arc will eventually provide, but I'm not aware of anything in the language that would preclude good performance - it will depend on the implementation of the production compiler.

Is there a particular aspect of Arc that causes you concern with respect to performance? Also, do you have a particular application in mind, or is you concern a hypothetical one?

-----

2 points by pgwoden 5884 days ago | link

Thanks. It's not the language itself that I'm concerned about. Rather, I'm wondering how much interest there will ever be in developing a truly fast Arc.

Consider Python, for example. It's been around for a while and is widely used. Yet it's still pretty slow. Fast enough for building websites, sure, but it's nothing like C or Fortran for serious numerical work.

Python's supporters will tell you that its slowness doesn't matter, because you can re-write the time-consuming bits of your code in C. I can imagine that that works well for web programming where latency accounts for much of the execution time anyway and one might only occasionally have a really chunky calculation to do. But in my experience doing Monte Carlo simulations, solving large systems of non-linear equations and that type of thing, a large part of the code is fairly critical for execution speed, so re-writing in C would mean doing a large part of the project in C. If I wanted to do that, I'd just write in C to begin with.

-----

1 point by lojic 5884 days ago | link

What programming language do you currently use for your numerical work?

-----

1 point by pgwoden 5883 days ago | link

I've experimented with Ruby and Python lately, with the hard-care numerics to be written in C, as I alluded to above. I've not actually done huge amounts of programming recently, although in the past I did so in a proprietary language resembling C and in Fortran. Recently I have a look at CL (which accounts for my interest in Arc) and at OCaml, though I've not yet used either extensively.

-----

6 points by jawhite 5885 days ago | link

I'm writing from a position of ignorance here, not having had the time to learn and use Arc yet (apologies), but if Arc is to be the 100 year language I'll put in my vote for a language spec that permits a fast implementation.

Without refuting pg's (and others') prediction that apps will increasingly be delivered over the web, I would guess that there will always be hard problems to solve that require a language that is both expressive and fast, and for which customers would be unwilling to have the solution delivered over an open network, e.g. when the input data involves sensitive, valuable IP.

pg has written before regarding CL's dual nature (writing apps fast vs. writing fast apps, destructive vs. non-destructive operators, etc.). Does Arc have a similar nature at present? Are there other ways to provide the same facility without CL's brute force approach of providing two different implementations of many operators?

-----

4 points by sacado 5885 days ago | link

I totally agree with you. I do a lot of system scripts (log analysis, job management on a cluster, etc.) and I usually use Perl or Python for these. I'd love switching to Arc for that, but as for now speed is really an issue. Especially startup time : a script is something you possibly run very frequently (sometimes many thousands instances at a time), and even two small seconds on startup is a killer. That's why I spent a few time trying to fix this in arc-exe, btw.

I will soon have to try classification algorithms (you know, bayesian nets, neural nets, these kinds of things). It will be prototypes, so full speed is not required yet. Well even there, something a little faster would be good.

Well, pg promised a profiler and proposed (in a recent poll) to work on an FFI a day or another. I guess that,with these two, we will have what we need in the future...

-----

4 points by jawhite 5882 days ago | link

The FFI is certainly important and I wouldn't use Arc if it didn't have a well-defined standard FFI in its eventual form (to my mind that was one of the biggest mistakes of CL).

I don't think that a FFI is a good substitute for a fast Arc implementation though. It's true that I can re-code any time-critical parts of my application in C, but why should I have to? What you're saying amounts to an unpalatable compromise, and to my mind it's not a compromise that a hundred year language can afford to make.

It will be good to have a profiler for Arc.

I don't agree with your argument that a profiler and FFI will gives us what we need for the future. Python has both a profiler and an FFI. I've used both, in combination, for numerical simulation. It's a highly inelegant and ultimately unsatisfying solution. The python part of the app was never fast enough, and whilst profiling helped, it also lead to some very ugly code. The C-coded parts were... well... C code! I think I would be preaching to the converted if I started in on C's deficiencies in this forum :-)

CL for all its faults shows us that a lisp can be fast. I can't see any reason why Arc implementations won't be fast eventually too. If someone more knowledgeable can see one I'd be very interested to learn about it, but this thread doesn't seem to have come up with anything concrete.

[A word of explanation: the Python/C combination wasn't my decision, and in hindsight I wouldn't recommend it to anyone starting a new project involving numerical simulation.]

-----

5 points by sacado 5882 days ago | link

You make a good point there. However, look at the very-efficient CL code : it looks like C code. Written in CL, sure, but it often deals with calculations on vectors of fixnums where the type of everything was predefined and type checking reduced to nothing. No dynamic dispatch of generic functions on heterogenous lists here.

But an FFI does not necessarily mean you have to write C code everytime you need an efficient calculation. Have a look at the following code I've just tested :

  (def fib (n)
     (if (<= n 2)
        n
        (+ (fib (- n 1)) (fib (- n 2)))))

  (time:fib 30)
  -> time: 47498 msec.

   << Hidden code here >>

  (def ffib (n)
     (if (<= n 2)
        n
        (+ (ffib (- n 1)) (ffib (- n 2)))))

  (time:ffib 30)
  -> time: 4635 msec.
  
More than 10 times faster, and they are both pure Arc code. What's the magic part ? Well, it just imports new declarations of +, - and <= I previously wrote in C (with the FFI I worked on recently). These definitions only deal with fixnums, that's why they are fast. Here is the full code :

  (w/inline "
   char inf (long a, long b){
      return a <= b;
   }

   long minus (long a, long b){
      return a - b;
   }

   long plus (long a, long b){
      return a + b;
   }"
   (cdef _<= "inf" cbyte (clong clong))
   (cdef - "minus" clong (clong clong))
   (cdef + "plus" clong (clong clong))
   (def <= (a b) (is 1 (_<= a b))))
Well, that declaration could be encapsulated into something like : (declare-numeric (def ffib ...

And you would end up with something as fast as CL (and looking like optimized CL code). Well, not really, but this is still alpha version.

-----

5 points by eds 5881 days ago | link

Actually, if you are using Anarki, a lot of the slowness comes from infix.arc redefining math operators. Just removing infix.arc from libs.arc will increase the speed of math operations by about an order of magnitude. In fact, removing infix.arc actually produced more of a speedup for me than using your ffi.

With infix.arc:

  arc> (def fib (n)  (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
  #<procedure: fib>
  arc> (time (fib 30))
  time: 34672 msec.
  1346269
Without infix.arc:

  arc> (def fib (n)  (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
  #<procedure: fib>
  arc> (time (fib 30))
  time: 4219 msec.
  1346269
With ffi:

  arc> (w/ffi "gs1771.so"
  (cdef _< "inf" cbyte (clong clong))
  (cdef - "minus" clong (clong clong))
  (cdef + "plus" clong (clong clong)))
  #<primitive:ffi:plus>
  arc> (def < (a b) (is 1 (_< a b)))
  #<procedure: <>
  arc> (def fib (n)  (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))
  #<procedure: fib>
  arc> (time (fib 30))
  time: 6907 msec.
  1346269

-----

3 points by almkglor 5881 days ago | link

Dang.

Personally I don't even use infix. If the slowdown comes from that...

-----

2 points by eds 5881 days ago | link

Yeah, sorry about that. I wasn't thinking about performance when I originally put infix.arc up on Anarki, and since I was doing a lot of testing at the time I found it convenient to add it to libs.arc. Perhaps it would be best to leave it out by default though.

-----

2 points by eds 5881 days ago | link

Done. You shouldn't get any more performance hits from infix math unless you explicitly load infix.arc.

-----

1 point by sacado 5881 days ago | link

Oh that's right... On my machine, the FFI version is still a little faster (about 20% faster), but not that much. It now deals more efficiently with boolean values however, that might explain it... Anyway, I don't know if it's worth using FFI this way now... Not until we can compile Arc code to efficient C code at least :)

-----

2 points by jawhite 5881 days ago | link

Thanks for the counter example, you've made a very elegant point. It's quite encouraging in fact. It seems to point to the strength of Arc. I think this would require a lot more effort to achieve in Python.

Of course in this case one would hope that the fundamental arithmetic and relational operators would already be implemented efficiently in an Arc native binary compiler, but the principle you've illustrated still holds regardless.

-----

1 point by cchooper 5885 days ago | link

Is Scheme slower than CL?

-----

6 points by raymyers 5885 days ago | link

Since Scheme and CL are both specifications, it would be more meaningful to ask "Is MzScheme slower than SBCL?", for instance.

-----

3 points by KirinDave 5885 days ago | link

And to expand on that, the answer is generally yes. MzScheme is a great version of scheme for its completeness, not its speed. However, several schemes are directly competitive SBCL and CMUCL. Arc could be ported to one of these without too much effort.

-----

3 points by sacado 5885 days ago | link

As for speed, Stalin Scheme is amazing for example. Not very complete and well documented for example. However, CL is full of "efficiency hacks". Scheme is full of purity, not always easy to implement efficiently. Optimizing CL is easier than Scheme, IMHO.

-----

2 points by sramsay 5884 days ago | link

Why do you think CL is easier to optimize that Scheme? This isn't a hostile question; I'm just curious. Intuitively, I feel like "purer" languages should be easier to optimize.

-----

2 points by sacado 5884 days ago | link

First, CL has in the standard many possibilities for making declarations reagrding optimization : for the functions you want, you can compile code, declare types (e. g. this var only holds fixnums), declare you want to optimize the speed and ignore type safety, etc. This way, you end up writing code the way you would write it in C. There is no such thing in the Scheme standard. Individual implementations could, of course, but as far as I know no one does.

Abother example is call/cc. This is a very interesting beast, only existing in Scheme. But it is hard to implement efficiently.

The last example I can think of is 'nil. Using nil as false and the empty list is very interesting in this regard : you can implement nil as the NULL pointer, which is also 0, the false boolean. Less manipulations to do on the bare metal. Distinguishing between #f and '(), on the contrary, implies making more tests at the lower levels.

There are other points I guess...

-----

3 points by almkglor 5884 days ago | link

re: call/cc - I think a bit of the lambda the ultimate series of papers eventually boils down to the realization that a machine language jump-to-subroutine is equivalent to a call/cc, and the target of the call/cc just has to access the return address on the stack as a function address.

-----

3 points by kens1 5884 days ago | link

I don't get it. There's the whole stack copying for call/cc, so call/cc is much more expensive.

(I read the "Lambda the ultimate GOTO" paper you referenced earlier; it's about goto vs structured programming, not call/cc. As an aside, it's very interesting to reflect on just how controversial structured programming was.)

-----

4 points by soegaard 5884 days ago | link

Implementing call/cc efficiently has been well-reasearched in the Scheme community. For a very well-written account of a non-stack-copying implementation see

R. Kent Dybvig. "Three Implementation Models for Scheme". PhD. Thesis. http://www.cs.indiana.edu/~dyb/papers/3imp.pdf

Then continue at ReadScheme at "Compiler Technology/Implementation Techniques and Optimization" to see further developments (Look especially for Clinger's papers).

http://library.readscheme.org/page8.html

-----

2 points by almkglor 5884 days ago | link

Who said anything about copying stack? For that matter - who said local variables should be kept in the stack anyway?

-----

1 point by kens1 5884 days ago | link

In MIT Scheme, the stack gets copied; at least that's what I was told last week. Whether or not you use a stack, the state will need to be copied.

-----

1 point by almkglor 5884 days ago | link

In a function call, the state (the current computation being done) is saved anyway, and therefore "copied" if that is your preferred term. So ideally, call/cc should have the same overhead as an ordinary function call; the only difference is that in call/cc the continuation state is the value given to the function, while in a function call it's just one of the values given to the function.

Note however that much of the theoretical analyses of call/cc make a basic assumption of a "spaghetti stack", which would mean that partially unwound stacks would be saved implicitly as long as any continuation exists which refers to that stack, and all stacks themselves are subject to garbage collection. Most machines don't actually have a spaghetti stack and can't make a spaghetti stack anyway ^^. That said a spaghetti stack could be implemented as a simple list, with push == cons and pop = cdr.

Alternatively store the local variables on a garbage-collected heap, and include a reference to the local variables with the continuation/return address (you'll probably need to save the pointer-to-local-variables anyway, since the target function is likely to use that pointer for its own locals). Again, no additional overhead over plain function calls, except perhaps to restructure the return address and the pointer-to-local-variables.

Don't know about MIT Scheme, but if I were to implement call/cc on stock hardware and compiling down to machine language that's what I'd do ^^

-----

3 points by kens1 5883 days ago | link

I'm still totally not understanding your claim that call/cc should have the same overhead as an ordinary function call.

I read the Clinger "Implementation Strategies for Continuations" paper and they found call/cc about 10 times slower than function calls on the tak/ctak tests. I tried those tests on PLT Scheme and the overhead I saw is even worse: .7 seconds with function calls vs 51.8 seconds with continuations on (tak 24 16 8).

Clinger says about the stack strategy: "When a continuation is captured, however, a copy of the entire stack is made and stored in the heap. ... Variations on the stack strategy are used by most implementations of Scheme and Smalltalk-80."

-----

3 points by almkglor 5883 days ago | link

Components of function call: (1) put arguments somewhere (2) put return address somewhere (3) jump to function.

Components of call/cc: (1) put return address as argument somewhere (2) put return address somewhere (3) jump to function.

That said, continuations generally assume that "put X somewhere" means somewhere != stack. Yes, even return addresses are not intended to be on the stack when using continuations. The main problem is when compiling down to C, which always assumes that somewhere == stack. If you're compiling down to machine language where you have access to everything, then you can just ignore the stack, but not in C, which inherently expects the stack.

-----

3 points by sramsay 5884 days ago | link

Ah, I get it. I suppose typing is one of the big issues affecting speed. If the language standard insists on dynamic typing, there might be no way to get certain kinds of optimizations.

And yeah, call/cc is probably always going to be a bear. But man is it cool. :)

I suppose this goes against what a few of us (including me) were saying above -- that the language and the speed are really separate issues. Or maybe it's more coherent to say that language standards (as opposed to "languages" generally understood) can have a profound effect on speed. If they don't give implementors a lot of choice, they can box people into certain corners.

It's interesting that Scheme actually mandates optimization in at least one case (tail recursion). I don't know how many language standards make those kinds of demands, but I suspect there aren't many.

-----

2 points by sacado 5884 days ago | link

Tail recursion is interesting as it is not especially an optimization for speed but as a way to make programmers rely primarily on functional programming : if you don't have it, functional programming is rapidly a dead-end as you can make the stack explode really fast. As a bonus, it is faster :)

-----