Let me preface my comments by saying that I mean them constructively, not negatively.
One big problem is this project has little relevance to the world outside arclanguage.org.
On the technical side, I'd suggest looking to see if anyone has made a Scheme to Common Lisp compiler, since that's an isomorphic problem. If one exists, that will give you a big start. If one doesn't exist, hmmm....
Offhand, I don't see how you're going to handle continuations or tail recursion. Note that many Scheme implementations punt on continuations (e.g. Kawa, Scheme 48, QScheme). Nil handling is also likely to be a pain.
If you're looking for a way to compile Arc, I think it would be much easier to use an existing Scheme compiler as the base, rather than using a Common Lisp compiler. Using a Common Lisp compiler seems to be just adding difficulty.
On the other hand, you have a lot of time budgeted to get the web server running. It seems to me that if you get Arc running, then srv.arc should just work and give you the web server for free.
I think interpreting Arc (as opposed to compiling) in Common Lisp using Peter Norvig's Scheme interpreter is doable, and I'm actually currently looking into that.
Thanks. I appreciate the constructive criticism :)
> This project has little relevance to the world outside arclanguage.org.
I don't see how much of anything I do here affects the outside world.
> I'd suggest looking to see if anyone has made a Scheme to Common Lisp compiler.
That seems like a rather roundabout method of compiling Arc to CL. Should we always use existing cross language compilers whenever we want to port Arc to a new language?
Say we have a language X, which only has a CL to X compiler. Compiling is then a conversion from Arc to Scheme, then to CL and finally to X. Is it really a good thing to have all the middlemen? And say we have a language Y which only has a X to Y compiler....
> I don't see how you're going to handle continuations or tail recursion.
Continuations are a difficulty, but I think one of CL's libraries may do the trick. They might not be truly first-class, but if they work I don't intend on worrying about it until I've got everything else working.
I don't see how tail recursion would be a problem. As far as I know most CL implementations provide tail recursion. (pg's ANSI Common Lisp talks a lot about using tail recursion in CL, which is one reason why I am not too worried about it.)
> It seems to me that if you get Arc running, then srv.arc should just work and give you the web server for free.
I hope it works out that way. (I will be more than happy to move on to other things if that step goes quickly.)
In my previous posting, I didn't mean to use the Scheme layer as a middleman; I meant to take an existing Scheme compiler and do s/lambda/fn/g on the code (as well as other necessary changes :-) to end up with an Arc compiler.
If you're implementing a Scheme-like system, I highly recommend Dybvig's "Three Implementation Models for Scheme" (http://www.cs.indiana.edu/~dyb/pubs/3imp.pdf). The three models are a heap model, a stack model, and implementation on a crazy string-based research processor; you'd probably want the stack model. Despite being a PhD thesis, it is very practical; it describes how the author implemented Chez Scheme and gives a pretty much full Scheme implementation running on a simple virtual machine. It handwaves about converting the virtual machine code to assembly, and gives some sample VAX code in an appendix.
I'll reiterate that you should look at how to make your project have a larger impact and relevance; what could you do (perhaps using your compiler as a base) that 1000 people would benefit from?
So basically you propose writing a meta-circular Arc compiler by porting a Scheme compiler to Arc, and modifying the compiler to compile Arc instead? That actually sounds like a really cool idea, if I understand you correctly. But what Scheme compiler would you use as a base?
I presume the paper you posted is meant for implementing Scheme from scratch? (So this would be an alternative to the previous option?) I'll look over it if I get the time.
Now perhaps I'll see if I can make a proposal out of this. (I might as well, assuming I have the time, since I am allowed more than one.) Although there is one issue: in order for such a proposal to get funded, I need someone from the Arc community to step up and mentor it.
I ended up creating a public copy of my own git repo not to mess with Anarki...
As for your concerns, I think you shouldn't be too worried. Just check it out, there is ample room for improvement, especially in speed, which I understand is your primary objective. No network yet, no threads, continuations are implemented as chains of closures (I honestly don't know better than that)... it just loads plain arc.arc, and I haven't checked it thoroughly.
For me, this was a very nice opportunity to learn about the CPS transformation. Arc is perfect in this respect (only 4 special forms). I'm pretty sure that, as an exercise, it's full of bugs and misconceptions, because I love to learn by doing (and make lots of silly mistakes along the way)...
I'll be happy enough if it helps just a tiny bit...
I appreciate everyone's comments, both the support and the criticism. And I'd like people's help with something.
At the moment I have one major problem: LispNYC does not deem CL-Arc to a be a full summer's worth of work. So unless I can show them that CL-Arc really is a summer's worth of work, I'm left with either coming up with additional ideas which I can work on after CL-Arc, or writing an entirely new proposal on another topic.
I'm wondering if anyone has ideas for what I could work on after completing the basic CL-Arc. If the axioms and core language take a couple of weeks, and the libraries take a couple of weeks, and FFI (probably just a port of the current FFI on Anarki) takes two weeks, that leaves a lot of time in the summer. What can I do with this time?
One idea is I could port Arco to CL-Arc. Unfortunately Arco isn't complete and is undergoing rapid changes, and might be completely different by the time I get around to CL-Arc. This makes it even more difficult to write a definite proposal.
> How long do you think defcall and settable-fn would take?
About a week, maybe less ^^. Defcall is reasonably trivial although it requires some rethinking. settable-fn is implementable using defcall (see nex-3's settable-fn2) but I'm personally dubious of such a style, and prefer my own.
> (But I'm a little dubious about working on something pg may have already done.)
We don't have evidence either way - pg hasn't commented on this (none that I've seen, anyway). Up to you to decide whether to act as if pg made it, or act as if pg didn't make it.
Besides access to all the CL libraries, I think it would be worth doing for speed alone. Not everyone has (yet) used Arc in a situation where the speed became an issue, but it does happen pretty easily (just ask almkglor).
In SBCL, I am consistently able to achieve results within 30% of C programs. Without type annotations the speed is still quite respectable.
If we ever want Arc to be usable in the wild for a diverse range of applications, we will need a compiler that can generate fast code. This is one way of getting there.
Well, there are still ways to break the speed barrier. For instance the main reason I made cached-table was in order to cache the results of parsing the wikitext in Arki (but not keeping them for too long, since that would simply be a waste of memory). This is even arguably the "correct" thing to do and I might not have done it yet if Arc hadn't been so slow ^^.
"When I was a noob, I talked like a noob, I thought like a noob, I reasoned like a noob. When I became a hacker, I put noobish things behind me. Now we see but a poor implementation as in an alpha; then we shall see code to code. Now I code in part; then I shall code in full, even as I am full of code. Now these three remain: hubris, impatience, laziness. But the greatest of these is laziness."
Yes, you can always optimize your Arc code to mitigate the problem. My point still stands though. If we want to be able to use Arc in production situations, we will eventually crave an industrial-strength Arc compiler.
This becomes even more obvious if you take seriously the notion of "A Hundred-Year Language".
We ought to focus on what gap Arc is really meant to fill: quick prototyping, and easy deployment of applications. It is not designed to crunch numbers at the speed of light, or cure cancer. Web applications simply don't need the kind of speed that C et al. do. If they did, we wouldn't be using Python or Ruby either.
We should focus on improving what Arc does, rather than trying to morph it into the behemoth Swiss Army knife that is Common Lisp.
Not to discourage you from your own pursuits regarding this, but I think as a whole, the Arc community would be better off putting its focus elsewhere.
I can't speak for the whole of "the Arc community" any more than you can, but...
I'm not trying to turn Arc into Common Lisp, I'm advocating making Arc fast. There's a difference.
CL was a behemoth because it was a union by committee of all the cruft in every popular Lisp dialect circa 1980. One of Arc's goals is to design from a clean slate. There is not a pressure to conform to bad or inconsistent decisions of the past. None of that has anything to do with performance.
The question becomes: Do we want to make a real general purpose language, or do we want Arc to become "yet another scripting language"?
This is a long term question. We may not need a compiler tomorrow, but I'm not ready to say for the long haul: "Arc is for exploratory programming and rapid prototyping. It will never scale. You shouldn't write ray tracers in it, or image processing, or games. It's fine for web apps -- unless they get allot of traffic."
The funny thing is that, even in some of my webapps, I always needed some more speed, at one moment or another. The last example I have in mind : I wanted to generate a chart showing the use of a big system. As the chart's look depends on a few criteria given by the user, it has to be generated on demand. Well I had to dig into a DB containing millions of items, then mix these items together (that couldn't be done in SQL) and finally generate the chart.
Glad I had Psyco there. If I hadn't had it, or at least Pyrex, I would have probably dropped Python because writing C extensions for it is quite painful. And that's also the reason why I never really used Ruby, despite its cool features.
And I don't want to write prototypes and say : "Hmm... My code is working now, let's write it in a serious language for the production version".
Look at Scheme anyway. We really can't say it's a language focused on speed or designed to crunch numbers. Well, look at Ikarus. Oh, yes, for number crunching, you might prefer Stalin. Even C looks slow when compared to Stalin.
Of course, this shouldn't be the main focus of the community, and I don't even think the language should be designed with speed in mind (well, a little of course, or else we would have dead slow numbers implementd with lists of symbols) but that it should be seriously taken into consideration.
Yes, exactly. I think a major cause of all this railing against optimizations is all the newbies who have just learned to write programs running around shouting about efficiency. I was one of those just a couple of years ago. The problem with these newbies is that they're naive. They decide that something is fast or slow based on how efficient its most naive implementation sounds. It seems that they grasp the vague idea that optimized code tends to be longer than code that is not optimized, but rather than responding to that by not trying to optimize until they know what parts of the code are slow, they respond by assuming that approaches that take more lines of code are more efficient and therefore better.
A misplaced focus on speed is bad, and you should get it working before you make it fast, but that doesn't mean speed is a non-issue. If a program is slow enough to cause annoyance, that is a problem, and it should be fixed. Languages have to pay extra attention to speed issues. If programs written in a language are slow because of the language, and not because the programs themselves are badly written, there's something wrong with the language.
Another thing that newbies don't get is that well-built languages are usually optimized so that the more obvious and more commonly-used approaches are actually faster than that tangled mass of "optimized" code you just wrote. Profiling profiling profiling. Don't just guess.
So, the points are: performance should be a secondary concern, but secondary is still pretty high up on the list, and optimization should be based on information gathered with a profiler, not what sounds efficient or inefficient. Sorry for rambling, I hope this post contributes something to something. I just have a tendency to spew everything I have to say about a topic all in one place every now and then, even if only some of it is relevant. I guess you could boil this post down to a "me too", but only if you boiled it a lot.
Yes, but if we want quick prototyping, we also want transformation of the prototype to something we'll use, which might need to be faster than our mockup. Take for example Arki: the wikitext parser was written easily using a quick prototype on raymyers' treeparse, but in actual use, the wikitext parser was too slow. Unfortunately, there's no easy way to transform the parser structure to faster techniques (such as state machines), without doing it by hand, which rather undermines the purpose of the quick prototype. By optimizing the underlying parser, however, the prototype was still useable, and by exposing a few more particular combinations (such as nil-returning sequences, for when we only care that a syntax exists or doesn't exist, not the individual charactes of the syntax), we can refine the prototype into something faster.
I tried that. Not very easy as Ikarus' hash-tables don't work like mzscheme's (there is no "equal" hash-tables). You would have to rewrite the reader too. And I'm not sure we would get all the stuff with sockets and networking. It doesn't have an FFI yet, either.
Well, actually, I'm not sure if implementing a still-designed language in the beta-release of a compiler is a long-term solution... :) Maybe in a few months / years this could be done ? But as for now, I think porting it to CL would be easier.
I also suggest that. In fact, looking at Chicken's implementation - stack == heap - is rather inspiring, because it shows exactly how a garbage-collected memory manager should be done: just decrement a pointer, in this case the stack pointer. Brilliant IMO. Wish I'd thought of that.
Why not? Every higher-level language is just a macro on the assembly language of a computer. Model your native code as a list like (__asm (mov 42 ax) (ret)). Consing is just a call to a predefined cons: (__asm (push d) (push a) (call cons) (ret)) - or by applying the lessons of Lambda The Ultimate X (__asm (push d) (push a) (jmp cons)). ^^
Okay, okay, I was actually planning to do something like that a long time ago, haha. But I haven't (yet) found any holes in the basic concept of modelling every function call (f x) as a macroexpansion on (__funcall f x), which expands to an (__asm) form with the function call. Then for functions themselves (fn (x) (f1 x) (f x)), transform the last function call to a tail call: (__asm_let x (esp 4) (__funcall f1 x) (__tail_funcall f x)). Stuff like that ^^.
If you want to start writing an Arc native code compiler (in Arc), it just brings up a lot of difficult issues that I'm not sure how to deal with. A reader, GC, continuations, tail recursion, etc. all have to be implemented in Arc. You stop getting those for free once you remove the Scheme runtime from the picture.
So quite frankly, I have a lot to learn before I'll be able to take on a project like that.
Encapsulation my friend, encapsulation. Just make a general sketch and leave the details a bit. Then write the details. Besides, you've already done it! Just s/CL/Arc/ your posted article, then s/compile to Arc/compile to assembly/
Sure GC is nontrivial, but Boehm's GC is not bad at all. And if you really need continuations/tail recursion than make everything continuation passing style (you'll probably need to anyway). And I'm sure raymyers' treeparse can help in the reader department.
(IMO the difficulty here in itself is probably the assembly code you'll emit for a given piece of code, not reader/GC/conts/tailrec)
Remember, CL is by itself not so similar to Scheme that you can directly use its reader, as well as its execution model, in your final product. You'll write your own Arc reader in CL anyway (in CL 'arc == 'ARC, in arc 'arc != 'ARC), so you might as well (tada!) write it in Arc and compile it down to assembly. You'll need conts and no, you can't trust CL enough to handle tailrecs.
(Hmm. Maybe I shouldn't be advising you, maybe I should be doing this myself to steal your thunder ^^)
It's in the standard (CLTL2). You can fix everything, because you can tell the reader to use your own functions on particulars characters, as an example this is a piece of code that lets you use Arc [... _ ...] syntax in Common Lisp:
So would you use the existing (C/C++) implementation of Boehm GC? If so then doesn't that make this not completely implemented in Arc? If not then that's one more piece to write (although I guess it isn't too difficult to translate code that's already been written in another language).
Yes, the assembly part of it looks difficult to me. When I look at Arc or Lisp code I don't see any way to translate that to native code. Obviously has been done, I'm just not educated on such matters.
You have a good point about the reader, I should probably add that to my proposal. And writing it in Arc would be an interesting exercise.
And can't I trust CL about tail recursion? Most decent implementations do tail recursion, right? And can't I tell people to stay away from those that don't?
But suppose I can't trust CL to do tail recursion. What am I supposed to do about it?
> So would you use the existing (C/C++) implementation of Boehm GC? If so then doesn't that make this not completely implemented in Arc?
And neither is Linux completely implemented in C, and C compilers written in C are not completely implemented in C, because bits and pieces of the libraries they link their code to are written in assembly.
> Yes, the assembly part of it looks difficult to me. When I look at Arc or Lisp code I don't see any way to translate that to native code. Obviously has been done, I'm just not educated on such matters.
The Lambda the Lutimate papers are a good place to start if you're interested - they include some hand-written assembly code equivalents to Scheme/Lisp code, largely function calls and prefix/suffix. Given that the most basic axioms of Arc include (fn ...) and a function call syntax, this would be quite of interest.
> But suppose I can't trust CL to do tail recursion. What am I supposed to do about it?
"Yes, the assembly part of it looks difficult to me. When I look at Arc or Lisp code I don't see any way to translate that to native code. Obviously has been done, I'm just not educated on such matters."
I found a good link / tutorial about how to compile a subset of Scheme to C language. The compiler is about 800 lines of Gambit Scheme (blank lines included) and even deals with tail-recursion and continuations ! (well, that's based on the lambda papers...)
No GC, but you can use Boehm and have one for free.
The only time you need garbage collection is when you're allocating new memory. This is of course abstracted away into the 'cons procedure you end up calling at each (cons a d) - basically 'cons triggers gc if necessary. You may then very well just use the someone-Boehm garbage collector for C, which will (I think!) helpfully look at registers and stack for you.
The someone-Boehm GC (reportedly) works well with C - I'm reasonably sure that it will work well with assembly.
Anyway, maybe the right way to do so is by destructuring a Scheme implementation ? Starting from a given implementation, you write your compiler from scratch, but use the facilities of the chosen implementation for the reader and the GC. Then, once it's working, you gradually remove the scaffolding by implementing these things by hand...