It's because (fn args ...) currently compiles to (lambda args ...), so args becomes a Scheme list instead of an Arc list.
It's hard to see the difference because the Arc runtime mostly treats '() as a synonym for 'nil. But keys to tables aren't converted, so it is possible to get an observable result out of the bug this way:
arc> (let a (table)
(= (a ((fn args args) 1 2)) 'hi)
(a '(1 2)))
nil
There are at least two options here which don't interfere with existing syntax; I'll demonstrate using optional arguments rather than argument conversion for simplicity:
(def foo (x y ('o z)) ...)
This doesn't interfere because binding 'quote is more or less illegitimate in arc. (The reason for this lies in the underlying scheme - if 'quote is lexically bound as an identifier, it doesn't function as a special form.)
(def foo (x y (:o z)) ...)
This doesn't interfere because binding to ':o in arc is more-or-less useless, because ssyntax expands away ':o to 'o. The same is true of '+o.
I personally like the :o, maybe just because it reminds me of CL. Anyway, it shouldn't be too hard to add a special case for ': in ssyntax if it is the first character.
After all, wouldn't that be an error anyway? Therefore redefining it would not interfere with normal usage of ':. In fact, it makes sense to have different versions for each ssyntax if it is at the beginning, end or middle.
I meant an unquote that wasn't balanced by an enclosing quasiquote. e.g., `(,,x) has one too many unquotes. This is also true of unquote-splicing, like `(,,@x). clisp handles this in the reader:
$ clisp
[1]> `(,,x)
*** - READ: more commas out than backquotes in, is illegal
The following restarts are available:
ABORT :R1 Abort main loop
Break 1 [2]>
The Arc implementation can handle some cases, because the top-level 'unquote and 'unquote-splicing (outside of quasiquotes) will gripe, but it happens at macroexpansion:
$ rlwrap mzscheme -m -f as.scm
Use (quit) to quit, (tl) to return here after an interrupt.
arc> (load "qq.arc")
nil
arc> `(,,x)
Error: "reference to undefined identifier: _x"
arc> (= x 1)
1
arc> `(,,x)
Error: "unquote not allowed outside of a quasiquote: 1"
...and it now occurs to me that I'm not sure of any test cases that fail to signal an error. The error message isn't very nice, but as long as it does signal one, it's livable.
Extra input on this would be great. I'll try to hunt for ways to break it. Even if errors are signaled properly, it's better to have them at read-time, I think.
Extra input on this would be great. I'll try to hunt for ways to break it. Even if errors are signaled properly, it's better to have them at read-time, I think.
Hmm, you'd be rather ahead of much of Arc then. As you noticed with unquote, there are parts of Arc which currently don't catch errors at all, much less as early as possible.
I'm wondering if any read-time checks should be able to be turned off? Maybe I might want to be defining my own unusual quasiquote macro which does something with (quasiquote ((unquote (unquote x))))? I can't think of what that might be though.
What quasiquote expression expands into a dotted list so that you had to make your own append instead of using Arc's join? Is (join '(a) 'b) the kind of expression that would need to work? Do you see any disadvantage to fixing join so that it could do that?
What quasiquote expression expands into a dotted list so that you had to make your own append instead of using Arc's join?
It's predominantly because that's how the clisp version was implemented -- it's basically a straight port without much design consideration. But 'append does have some sort of impact (though probably rare in practice, and with debatable semantics):
$ rlwrap mzscheme -m -f as.scm
Use (quit) to quit, (tl) to return here after an interrupt.
arc> (load "qq.arc")
nil
arc> (toggle-optimize)
nil
arc> (defs foo () 'x bar () 'y)
#<procedure: bar>
arc> (macex1 '`(,(foo) ,@(bar)))
(append (list (foo)) (bar))
arc> `(,(foo) ,@(bar))
(x . y)
arc> (toggle-optimize)
t
arc> (macex1 '`(,(foo) ,@(bar)))
(cons (foo) (bar))
arc> `(,(foo) ,@(bar))
(x . y)
Optimizations or no, the expression should still mean the same thing, but as you can see, without optimizations it would break if 'join was used instead.
Incidentally, these are the same results you get from clisp. I don't have the interaction here because I just loaded the file and waded through repeated "continue"s until it finally yielded the damned result; if anyone's interested, I can reproduce it. Also, it's the result you get after fixing 'toggle-optimize, which I added but didn't use, so didn't have the chance to see it was broken. It should be:
Is (join '(a) 'b) the kind of expression that would need to work? Do you see any disadvantage to fixing join so that it could do that?
Well, arc seems to pitch on dotted lists a lot, since they aren't used too often. But it seems an odd case to miss, if only for the sake of completeness. Even functions like 'len are broken on dotted lists.
Have you ever written a program where you used append with a non-list in the last argument to create a dotted list? Or called length on a dotted list? Aside from unit testing?
The reason why I ask is that there's lots of places where Arc isn't "complete" (notice there's a caar, a cadr, and a cddr, but no cdar function in arc.arc, for example), but the question is what would actually get used in a program? And what will help programs be written more succinctly?
Perhaps 'len is a bad example, since some languages (e.g., Common Lisp) like to consider it an error to ask for the length of an improper list. There's a difference between incidental errors ("oh, right, didn't bother with dotted lists because I don't really use them") and intentional errors ("dotted lists are flat-out wrong to use here"). But the point is orthogonal to the one example, I think.
Inasmuch as Arc isn't "complete", you wind up needing to re-invent the wheel, losing succinctness in the process. Your example of car/cdr compositions is such an example, albeit an uncompelling case since (= cdar cdr:car) is easy to write. The reimplementation of 'len for ppr.arc is another, more compelling instance.
That's not to say that a language should be designed to cater to every possibility that might crop up in code. That leads to bloat. I'm just saying that things like handling dotted lists seem more like incidental "didn't happen to get around to it" bugs than "this should be an error". Indeed, the most striking example of this distinction I can think of is the entire reason that I wound up porting clisp's backquote: "never really used ,,@(...) before" vs. "you shouldn't use ,,@(...)". It's not so much a matter of commonality as it is about what makes sense.
Does that mean that 'append is right to use here? I don't know. I was just copying Common Lisp. If there's no logical reason to include it, going with 'join would be cleaner. On the other hand, if there's no logical reason to exclude it, which do we go for? I'd opt for the one that gives the most freedom: since 'append can be used just like 'join except in an edge case that might be useful, 'append offers you more utility -- rather than giving you an error.
(This sentiment could be phrased much more strongly if I actually had a useful case for choosing 'append over 'join. Note that I didn't try to provide a "real" example, just a demonstrative one.)
Here's what I suggest: publish a fixed 'join that is able to correctly create dotted lists. Now anyone who wants that doesn't have to reinvent the wheel.
Next, publish a version of your qq port that uses 'join. Now qq is simpler because it doesn't have to implement its own 'append; and you can point out that if we've using the old join then some qq expressions won't produce dotted lists, but if we use the new join they will.
I'm certainly open to using a fixed 'join, though my aim was to keep qq working as closely to vanilla Arc as possible, instead of modifying arc.arc functionality. Using Arc's current 'join gives way to some incorrect optimizations, as ported from the Common Lisp code -- e.g., (join nil form2) is not necessarily equivalent to just form2. So, I'll have to inspect that code, if there's the possibility of using arc.arc 'join (though I wonder if giving both options would be any "cleaner" than just keeping 'append around until/if 'join is "fixed").
On another note, a more appropriate place for the code and especially an altered 'join would probably be the Anarki repository (if only for keeping it a little more centralized). But I'm not sure I understand the protocol there, since I don't see any 'lib' directory in the arc3 branches. Unless this counts as a fix, I guess. Anyone care to give an executive summary / crash course? (The last I read about it was http://arclanguage.org/item?id=9777) Does it matter to anyone if this gets pushed to the git repo?
Using Arc's current 'join gives way to some incorrect optimizations
For any expression that doesn't have a dotted list as its final result?
my aim was to keep qq working as closely to vanilla Arc as possible, instead of modifying arc.arc functionality
The goal of Arc is to make it easy to write succinct programs. So, if:
A) someone finds that being able to use `(1 ,@2) helps them write a succinct program; or,
B) you have some natural optimizations that are easy to write if intermediate forms can use dotted lists, but would need more difficult or cumbersome code to work around a join / append that didn't work with (join ... atom), or even be conceptually more difficult to think about its correctness,
then I'd say let's fix Arc so that 'join is able to produce dotted lists.
On the other hand, if the only reason that you're implementing your own 'append is so that the code will pendantically pass Common Lisp unit tests for expressions that no one actually uses in their programs, then I'd question why we need that functionality :-)
What does 'list* do?
;; arc doesn't like macro definitions inside of 'do blocks (bug?), so resort to
;; defining them here.
What didn't work? In arc3:
arc> (do (mac a () 33))
#3(tagged mac #<procedure: a>)
arc> (a)
33
For any expression that doesn't have a dotted list as its final result?
Yes.
The way the optimizations work is to merely streamline the things that quasiquote itself introduces. It doesn't require or even expect user code to use 'append. e.g.,
Note that the top-level 'append wasn't touched -- it gets turned into (list (quote append)). But also note that the naive macroexpansion without optimizations produces (append (list a) (list b)), which is obviously excessive. So,
arc> (toggle-optimize)
t
arc> (macex1 '`(append ,a ,b))
(list (quote append) a b)
We still keep the top-level 'append, but a and b are no longer nested in a bunch of conses; the optimizer only changed the 'appends that quasiquote introduced haphazardly.
Then, if the optimizer tries to get rid of the extra 'appends, it must know that the final result will be the same. e.g., (append nil form2) == form2. Meaning that
The (append nil (f1)) that quasiquote introduced in the first case is turned into just (f1) in the second. Note again that the top-level 'append isn't touched; it doesn't matter if the programmer uses 'append.
Now, if quasiquote was changed to use 'join, so that any 'join it introduced might be optimized, there are fewer invariants because, for instance, (join nil form2) != form2 if form2 is not a list.
This wouldn't matter in optimized code, because the meaning of (join nil form2) could be taken to mean the same as (append nil form2). That is, we could use any ol' sentinel in lieu of 'append and treat its rules the same, like
thus leading to the same optimized code. However, when optimizations are off, if quasiquote produces (join nil form2) it might lead to an error where (append nil form2) would not. That is, if the above OPTIMIZE-ME were 'join, evaluating it as such dies when (f1) is not a list. Moreover,
(iso (join (list 'append) (join nil (f1)))
(cons 'append (f1)))
wouldn't necessarily be true.
So! If 'join were used instead of 'append, optimized code wouldn't care. But the invariant is supposed to be that optimized code results == non-optimized code results. (join nil form2) != form2, so the optimizer shouldn't touch 'join in this case, like it does with 'append currently.
All this means is that, if we s/append/join/g, the optimizer would need to use a different set of rules. This isn't so bad; it's not an insurmountable task by any stretch. But in so doing, it would change quasiquote semantics vs. using an append-style join (as discussed previously). So, giving both options would need to cover these altered semantics. Again, doable, just a note of concern when it doesn't seem a bad thing that 'append can be used (or 'join could be modified to work like CL 'append -- not against that, except when working with vanilla Arc).
In a similar vein, 'list* is introduced by the optimizer in attempts to cons less. Quoth CLTL:
list* is like list except that the last cons of the constructed list is
"dotted." The last argument to list* is used as the cdr of the last cons
constructed; this need not be an atom. If it is not an atom, then the effect
is to add several new elements to the front of a list. For example:
(list* 'a 'b 'c 'd) => (a b c . d)
This is like
(cons 'a (cons 'b (cons 'c 'd)))
Also:
(list* 'a 'b 'c '(d e f)) => (a b c d e f)
(list* x) == x
The most glaring issue with this being that it relies on the efficiency of the implementation, of course. So, I'm not even sure that mine conses any less than using just 'cons. This is fixed by a not-dumb implementation, of course!
What didn't work?
arc> (let b nil (mac b () 33) (b))
Error: "Function call on inappropriate object #3(tagged mac #<procedure: b>) ()"
arc> (do (mac b () 33) (b))
Error: "Function call on inappropriate object #3(tagged mac #<procedure: b>) ()"
(By the way, is this right-margin crowded enough for you? Yay, nesting!)
thus leading to the same optimized code. However, when optimizations are off, if quasiquote produces (join nil form2) it might lead to an error where (append nil form2) would not. That is, if the above OPTIMIZE-ME were 'join, evaluating it as such dies when (f1) is not a list
Yes, but your example was ",@(f1)", so naturally that wouldn't work without 'append when (f1) isn't a list.
I can think of three cases:
Case A: we want to use ",@X" when X is not a list. This isn't supported by Bawden's implementation, so it's a new feature. Clearly we need to use 'append then.
Case B: we require X to be a list when we use ",@X", but the qq expander (optimizing or not) relies on intermediate forms that call 'append with non-list arguments, even though neither the input nor the eventual final output has dotted pairs. Then we'd either need to use 'append, or rewrite the expander not to produce those intermediate forms.
Case C: When X is a list in any use of ",@X", the expander calls append and produces code that calls append with only lists. Then we could use Arc's current 'join instead of 'append.
Maybe I'm just up too late, but I'm not seeing anything in your examples that would indicate that B is the case?
If I always only use ",@" with lists, does the current code (with optimization turned on or off) ever produce the wrong result if 'join is used instead of 'append? (It's OK if the answer is you wouldn't know without looking into the question further).
(do (mac b () 33) (b))
oh, were you hoping that the macro would be staticly scoped?
I seem to have misunderstood your question. Apologies. The cases will make it easier for me to not talk past you:
Case A is what I've been defending (and is currently how the port works). I don't have any direct examples of its usefulness, but think that it's semantically clearer (a point which I don't think I've made clear):
$ rlwrap mzscheme -m -f as.scm
Use (quit) to quit, (tl) to return here after an interrupt.
arc> (load "qq.arc")
nil
arc> (= tail '(b c d))
(b c d)
arc> `(a ,@tail)
(a b c d)
arc> (cons 'a tail)
(a b c d)
arc> (= tail 'b)
b
arc> `(a ,@tail)
(a . b)
arc> (cons 'a tail)
(a . b)
I also consider it an added bonus: dotted lists exist, so it seems presumptuous to think that macros might not stand to gain from using them, splicing into them, whatever. (Just as it seems presumptuous to think that ,,@(...) should be an error.)
Case B is incorrect, so far as I can tell. Indeed, the clisp code was very careful about this -- making sure not to pass the cdr of a dotted list as the first argument to 'append, for instance.
Case C is correct, I think, but would run against case A, since using 'join when the splice is not a list will result in an error, as in the examples I've belabored.
As an example, the above definition implies that
`((,a b) ,c ,@d)
will be interpreted as if it were
(append (list (append (list a) (list 'b) 'nil)) (list c) d 'nil)
but it could also be legitimately interpreted to mean any of the following.
(append (list (append (list a) (list 'b))) (list c) d)
(append (list (append (list a) '(b))) (list c) d)
(append (list (cons a '(b))) (list c) d)
(list* (cons a '(b)) c d)
(list* (cons a (list 'b)) c d)
(list* (cons a '(b)) c (copy-list d))
(There is no good reason why copy-list should be performed, but it is not prohibited.)
I just happen to like the idea, is all. And, admittedly, for wholly impractical reasons!
If I always only use ",@" with lists, does the current code (with optimization turned on or off) ever produce the wrong result if 'join is used instead of 'append? (It's OK if the answer is you wouldn't know without looking into the question further).
Sorry if I come off like that. I don't mean to seem like I'm trying to answer without knowing anything ("going from the gut" or whatever).
For what it's worth, I strongly think that, if unquote-splicing is used strictly with lists, 'join will do the same job as 'append.
Of course, I wouldn't know without looking into the question further. ;)
So, have I finally managed to address what you asked?
oh, were you hoping that the macro would be staticly scoped?
Not even that. I just wanted it to be done in a single sexp on the last unit test (since it's 1 sexp per test), as I note in the source:
"local" macros don't work in arc:
arc> (let a 12
(let b nil
(mac b ()
(let c 19
``(,a ,@',@(list c))))
(b)))
Error: "Function call on inappropriate object #3(tagged mac #<procedure: b>) ()"
Even using a 'do block doesn't work (for some reason):
arc> (do
(mac m ()
(let c 19
``(,a ,@',@(list c))))
(let a 12
(m)))
Error: "Function call on inappropriate object #3(tagged mac #<procedure: m>) ()"
arc> (do
(mac m ()
(let c 19
``(,a ,@',@(list c))))
(let a 12
(m)))
*** redefining m
(12 . 19)
Unsure whether this is an Arc bug, the problem was mitigated by using a (mac ...) outside of the 'do block.
So, have I finally managed to address what you asked?
Yup!
OK, here's what I think.
One principle of Arc is to find the minimum combination of code that implements the language. So we shouldn't end up with both a 'join and a 'append; either we decide we want 'join to be able to produce a dotted list, or else we should just use the old 'join.
One way of getting succinct code is not to implement features that no one uses. So if the only reason for using 'append (or extending 'join to be able to produce a dotted list) was to pendantically pass some Common Lisp unit tests, then I be inclined to go with case (A) and just use Arc's current 'join.
However, another principle of Arc is not to forbid things to programmers unless there's no way to provide them with logical consistency. I can't think of any reason to tell you not to use
arc> `(a ,@'b)
(a . b)
if you want to... so why should I advocate for restricting splicing to lists?
It's shorter than your append2/append, at the cost of not providing an explicit error message if a non-list is passed in some argument not the last.
I just wanted it to be done in a single sexp... Unsure whether this is an Arc bug
In Arc, the entire expression is macro expanded and compiled, and then evaluated. So in
arc> (do (mac a () 33)
(a))
Error: "Function call on inappropriate object #3(tagged mac #<procedure: a>) ()"
what's happening is that the 'mac form, when evaluated, will create a new global macro 'a. However, when the 'do form is being macro expanded / compiled, the 'a macro hasn't been defined yet. So "(a)" compiles into a function call on 'a. Then, when the 'do form is evaluated, 'a is set to be a macro, and then 'a is function called. Which produces an error, since a macro object isn't callable.
I've called length on a dotted list in ppr.arc, and to do that I needed to redefine 'len to support it. I can't really think of an application outside of messing with argument lists.
(We could also look around to see if there's other working implementations that we could use... plt scheme's R6RS implementation of quasiquote or Common Lisp's implementation might be worth checking out).
To run your example, I'll load my port of Bawden's implementation:
$ mzscheme -m -f as.scm
Use (quit) to quit, (tl) to return here after an interrupt.
arc> (load "bawden-qq0.arc")
nil
note how I'm bootstrapping here: I'm using Arc's default implementation of quasiquote based on MzScheme to load Arc and Bawden's code, and then I can switch over to using Bawden's code to implement quasiquote.
Next I can use the patch to disable Arc's expansion of quasiquote:
arc> (declare 'quasiquotation nil)
#<void>
...though the existing 'declare assertions use a true value to change Arc's behavior, so maybe this should be named 'disable-quasiquote-expansion or something like that instead.
Now that I'm not being preempted by the Arc compiler, I can implement quasiquote with a macro, such as with Bawden's implementation:
arc> (mac quasiquote (x)
(qq-expand x))
#3(tagged mac #<procedure: quasiquote>)
and your example runs like this:
arc> (= xs '(x1 x2) ys '(y1 y2) x1 'a x2 'b y1 'c y2 'd)
d
arc> (eval ``(list ,,@(map (fn (x y) ``(,,x ,,y)) xs ys)))
(list (a c) (b d))
What it does is if the user has defined a 'quasiquote macro, it allows the macro to take precedence over Arc's internal qq implementation; much like any macro in arc.arc can be redefined by the user.
The example now looks like:
$ mzscheme -m -f as.scm
Use (quit) to quit, (tl) to return here after an interrupt.
arc> (load "bawden-qq0.arc")
nil
arc> (mac quasiquote (x)
(qq-expand x))
#3(tagged mac #<procedure: quasiquote>)
arc> (= xs '(x1 x2) ys '(y1 y2) x1 'a x2 'b y1 'c y2 'd)
d
arc> (eval ``(list ,,@(map (fn (x y) ``(,,x ,,y)) xs ys)))
(list (a c) (b d))
here's a nicer version of the patch that doesn't need the 'declare to be used
Cool, thanks.
We could also look around to see if there's other working implementations that we could use... plt scheme's R6RS implementation of quasiquote or Common Lisp's implementation might be worth checking out
I also tried looking at PLT's r6rs quasiquote (specifically collects/r6rs/private/qq-gen.ss), but I found it a bit daunting because I'm unfamiliar with syntax-case macros. It's probably better for quasiquote to be written in Scheme, though, so perhaps the code could easily be transplanted into ac.scm. I'll just leave it to someone more Scheme-savvy.
I was thinking in terms of bootstrapping. The quasiquote library I wrote uses things defined in arc.arc, which in turn are written for quasiquote, but using the one on the Scheme side.
It's entirely possible to rewrite either to fit it all together. For instance, someone could rewrite the quasiquote library in terms of basic operators -- 'assign, 'cons, 'fn (instead of 'let), and the like. Probably more bearable would be rewriting arc.arc to use 'list and 'cons and so forth to build expressions instead of using quasiquote, up until we have enough to define quasiquotation (which doesn't require much); then load qq.arc and continue defining arc.arc with the newfound utility.
Neither of these options sounds pleasant if, as an alternative, we could just have an easily-added Scheme-side quasiquote -- e.g., if mzscheme already implemented it correctly, we'd be done. Granted, there are advantages to having a user-defined quasiquote, not just in the interest of keeping Arc axiomatic but also in giving programmers the option to modify it.
Efficiency was another concern of mine, in that doing quasiquotation in Scheme might be more efficient than cranking 'ac across all of the Arc functions back-and-forth. But this is likely unfounded and more a symptom me worrying rather than actual testing -- the difference could be negligible or even nonexistent. Arc quasiquotes (as I implemented) still compile down to simple Scheme code (to the results of the macroexpansions), so it's probably silly to worry. Just a fleeting concern.
Well, on efficiency, the quasiquote expansion happens at read / macro expansion / compile time, am I right? So writing the qq expander in Arc vs. Scheme isn't going to affect the speed of the running program, yes?
As for bootstrapping, MzScheme's qq implementation works as long as nested quasiquotation isn't used, and today arc.arc doesn't use nested quasiquotation. So I don't think we need to do anything beyond what we're already doing: load arc.arc, then load a qq implementation that handles nested qq, and then we're free to write macro utilities that use nested qq.
Arc vs. Scheme isn't going to affect the speed of the running program, yes?
Yes, so far as I can see. I was just being paranoid.
As for bootstrapping, it's still cleaner to have a single implementation, rather than juggling them. Given we're currently only juggling 2 implementations, the problem's not intractable, just "improper".
Interestingly, I'm finding that I don't think we need a version control system for distributing and sharing hacks.
Which doesn't mean "don't use a version control system". Instead, I think it means "use whatever version control system works best for you". Because if we can share hacks without a version control system, then you can freely choose which one you want to use, if one would be useful for your work. Your using darcs doesn't make it harder for me to use git.
How is this possible? So far this is largely speculation on my part. I've been thinking about how to share hacks and have some ideas, but not much implemented yet.
The first step for me was my "minimum distance from Arc" approach to hacks.
I started using this approach because I had hacks that I found useful and important (and I thought other people might find them useful also), but I thought didn't need to (and in some cases I thought shouldn't) be incorporated into Arc. Some were even single-project hacks: a hack that I wanted to use in a specific program but not one that I wanted in my other programs.
So being able to apply each hack independently became important to me. As a side effect, I found that it made merges really easy.
I find that getting a hack into a "minimum distance from Arc" configuration takes a little work. I don't program day to day with the original, pristine arc3, I have an arc3 with my usual ten or twenty hacks applied. Then I have something that I've written as part of my application that I think might be useful to other people, so I extract it as a patch or a library. Then I need to take the original arc3, apply the hacks that I think are the minimum prerequisites, create my patch or library file, and then test everything to make sure that I haven't forgotten anything. The testing process too manual and tedious for me right now, so I'm working on an "example runner" to make that part go faster.
Once I've done the work to get a hack into its "minimum distance from Arc" configuration, I've found that "rebasing" a hack onto a new version of Arc is easy. For a patch, I apply the old patch to the new Arc, fix problems if any, and create a new patch against the new Arc. Which I can easily do with just "patch" and "diff", no version control system needed.
That this is possible at all is due to Arc's design finding the minimum set of code to implement given functionality. Because of this, my hacks to Arc often end up being only one or two changes to Arc, which in turn means that hacks can be applied independently. That they can be applied independently means that I don't need to be merging a bunch of pg's changes with a bunch of my changes.
As a next step, going beyond the "minimum distance from Arc" approach, I'm now looking at ways to implement my hacks using minimal changes to Arc. I now look at one of my hacks, and I think, "OK, if Arc were more hackable, then I wouldn't need to be patching Arc to do this. I could be redefining some function or calling some hook to implement my change. So what can I do to make Arc more hackable, which I could then use to implement my hack as a library?" So now I have some smaller change to Arc which enables my hack.
As an example of the "minimal change to Arc" approach, if I were implementing a documentation system for Arc, I personally wouldn't put the doc strings inside the source code file. If, for example, the implementation of a function in arc.arc changed, then I might or might not need to also change the documentation (depending on whether the change made a difference in what the function did). If the documentation doesn't need to change, then I don't need to be merging my documentation patches with pg's changes in the same file.
Finding the minimal change to Arc has the additional advantage of enabling greater flexibility. If the documentation were in a separate file, then someone who wanted to translate the documentation into a different language (I mean people languages like Russian, French, Mandarin etc.) could do so easily, and I could choose which language I wanted by pointing the documentation system at that file. And, if a new arc.arc came out, I could use the new Arc while still reading the old documentation, while I was waiting for the nice translator to translate the new documentation into my language.
I still use version control for my own application since I'm spending most of my time writing functionality, not working on making the code as minimal as possible. And, if I were working on a library together with other people, I'd want a version control system to help with that. However, once the library is ready to be published, I'm imagining that it won't need to be published using a version control system, instead it can be source code and patch files, with some meta-information about dependencies.
If I turn out to be right, the upshot of all of this is that we don't all need to choose the same version control system. You can use git, or darcs, or whichever one works best for you. There should be enough meta-information published with hacks so that if you want to use a hack written by someone else, a simple script will be able to import it into darcs or git making the release history appear as commits. Then you can use the tools of your favorite version control system to track changes and, if you're hacking the hack, merge or rebase your changes on top of a newly released version of the hack.
The things you describe - using patch and diff to update hacks for new releases, applying and merging patches, etc, are essentially what a version control system like git does under the hood. So instead of letting a VCS handle this for you, you're doing it manually. And with all due respect, I suspect this works well only with small, few, and simply dependent patches.
Not that making patches small isn't a good thing for other reasons, but I don't like having arbitrary limits on the complexity of my hacks to arc. The hygiene branch on anarki, for example, introduces massive changes to the underlying arc, yet it's a worthwhile hack and I'm thinking of porting it upstream to arc3. The coerce patch, while not nearly as large, involves many deletes and hence changes many line numbers in arc: if I merely represented hacks as diffs, then combining that hack with others would be a pain in the ass due to this changing of line numbers - a simple diff/patch scheme doesn't maintain the history necessary to do this automatically, so I'm stuck fixing the patches on my own. But this is exactly one of the tasks VCSes are built to handle.
Similarly, I'm trying to port many of anarki's changes to arc2 upstream to arc3, and I've already generated 14 distinct and often dependent hacks. I can only see more being generated as the process goes on. Updating all of these hacks for the new feature additions manually would have taken me an hour, maybe several. With git it took me about half an hour (well, more thanks to a stupid mistake I made). If I write the script I have in mind for handling this (which merely drives git) I should be able to do it in ten minutes. That's about the time it should take, IMO.
Keeping changes minimal is good, but it's no panacea. Issues of scale will come up, as they have in other projects; VCSes were created for a reason. If someone were to write, as you suggest, a set of scripts for manipulating hacks - applying, merging, rebasing - with metadata about dependencies, they'd essentially have built a simple VCS!
(Darcs, in particular, is built around the notion of commuting patches - of extracting a given patch or set of changes from those surrounding it in a development history - which allows for precisely the kind of "independent" application of hacks that you desire.)
If someone were to write, as you suggest, a set of scripts for manipulating hacks - applying, merging, rebasing - with metadata about dependencies, they'd essentially have built a simple VCS!
Maybe. git wasn't up to handling the dependency part.
I suspect this works for you only because you have small, few, and simply dependent patches.
Yup. That's true. I'm imagining, with sufficient effort, large patches can be made small by making Arc more hackable. But unless and until that work is done for a particular patch, if it is possible at all, then a version control system will be necessary for the reasons you describe.
I did write a script to rebase my patches on top of the succession of arc3 releases ^_^ Despite all the various features of git, I needed to write the script to keep the rebase work from being unbearably tedious. What I found interesting was that once I had the script, I didn't need git for anything.
I'm imagining that as I publish my hacks, and if there are some that you'd like to use, that I'll be able to have a script automatically push them to darcs or git (whatever VCS you want to use), showing the succession of releases of a hack as commits in a branch, and then you'll be able to use the normal VCS mechanisms to merge and keep track of changes. If this turns out not to be true, then let me know and I'll see what I can figure out...
I think that both CatDancer and rntz have good points, the problem is that they are working on different things and thus have different perspectives.
CatDancer is mostly working on libraries - independent pieces of code that, while they may redifine some things, or make minor changes to the base language, mostly leave it alone. These can be written easily in the "shortest distance from arc" method and still work well together because they don't actually change the base of arc. This also means that while VC is useful in development, it is hardly needed in publishing.
rntz has been working on porting changes from arc2 to arc3 and some other hacks, many of which require major changes to the arc code base itself such as his coerce hack. These need to be VC'd so that they can be understood by other users of Anarki, and so that they can work more easily with eachother. Since many of them create major changes to the codebase, it can sometimes be a challenge to get them to work together and as he says plain diffs would be a nightmare.
I may be summarizing a bit much, and I'm sure there are more subtleties to it than that, but it's how I understand the situation.
As far as I can tell, they can be handled separately. The publishing of libraries and miscellaneous code can be done without publicly visible version control, but the Anarki base should probably be versioned.
Git is still a good choice for what we're doing, as far as I can tell, because for one thing all it is is scripts built for managing it's base. That is the essence of the so-called porcelain. I'm sure it won't hurt anyone to add a little bit more in the way of domain-specific tools. They might be useful to others as well.
So it sounds like we have to things going on here:
1) We need a meta-data and library publishing system for people to share code that they've written to use with arc.
2) We need a slightly better method of handling changes to the arc base so that people can coordinate massive changes thereto. I think that individual commits works ok for minor changes, but it rapidly gets complicated to handle if they start stepping on eachother. That's when we should switch to "branches" which can be interpreted to mean alternate conceptions of arc. Case in point being the hygienic version of arc in the hygiene branch.
At some point we just admit that the lisp community is hopelessly built on making their own incompatible versions of lisp, and everyone makes their own fork of arc. Magic merges and rebasing can only handle so much incompatibility in the history of the system.
I wouldn't dismiss the possibility of using darcs for your project. git was designed to merge tens of thousands of lines of code in a fraction of a second; that doesn't mean it isn't useful for other tasks, but it also doesn't mean that darcs might not be better for the kinds of tasks that you find yourselves needing to do.
Git was also designed to have a fast, simple back end, with a script based front end. Each command that you're used to using with git is just a shell or perl script that calls a few basic git functions to mess with blobs. Therefore, it is easy to add new tools and scripts. A good example given the darcs context would be: http://raphael.slinckx.net/blog/2007-11-03/git-commit-darcs-...
My point is that the architecture of git allows us to write scripts that work at the same level as the other git scripts; we can keep or leave the others as desired.
I find keeping a copy of each of the arc3 releases useful because then I can diff a new release against the previous release and easily see what's changed. So I imagine this would be a useful service to the community to make these older releases available so other people could do the same thing.
But this isn't work that Paul needs to do. It's easy to write a script that periodically downloads the arc3.tar, and if it has changed, increment a release number, and publish the new release somewhere where people can get to it.