My solution to this (while breaking compatibility with Arc) is to just not have mutable cons cells. I only see two legitimate uses for mutable cons cells:
1) Speed. But honestly, if you're micro-optimizing your Arc code by mutating conses...
2) Intentionally infinite lists, like akkartik's "nils":
(= nils (list nil))
(= cdr.nils nils)
I'm not concerned about the first case, especially because a sufficiently smart compiler should be able to handle immutable cons cells reasonably (I believe Racket does).
As for the second case... I think that's solved by adding in a more general system for dynamically generating lazy lists. Then akkartik's "nils" can be written like this:
Basically, lazy-cons takes two thunks: the first is called to extract the car, and the second to extract the cdr. This lets you dynamically create all kinds of nice things, like streams, and it isn't limited to constant values like cyclic lists are.
Thus, you get all the benefits of immutable conses and a strictly more powerful system for lazy list generation. The only possible downside is some performance, which I don't think would be a big deal.
Of course, I already know you're going for full Arc compatibility, so this post is probably useless, but oh well.
Removing mutating cons cells would only solve part of the problem. One would also need to essentially eliminate hash tables, and practically all mutating state. And no, I never understood how monads worked. XD
And yes, the ultimate goal I have for Arcueid is to make a fully compatible Arc implementation, so breaking this sort of compatibility is out of the question as far as this project is concerned, but it is interesting to consider.
It would, for one, make real-time garbage collection a lot simpler...
"It's a good question whether you ever need cycles outside of these two scenarios."
I suppose there might be some algorithms that work better with mutable conses, but honestly, I think conses work best when they're functional, since they have a recursive definition. Making them mutable muddles a lot of different things with no real gain because you rarely mutate conses in Arc.
1. Using a hash table for cycle detection can get prohibitive for large cycles. Fortunately there's Floyd's algorithm which detects cycles without the space overhead: https://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_h.... Here's how you might print an infinite list, one item to a line:
-- z.wart
def (print x hare)
if ~list?.x
prn x
(addr.x = addr.hare) # like arc's is
prn '...
(do (prn car.x)
(print cdr.x (cdr+cdr (or hare x)))) # + is compose
# create a cycle
(<- x '(1 2)
lastcdr.x x)
prn (firstn 5 x)
print x
To run it:
$ git clone http://github.com/akkartik/wart
$ cd wart
$ git checkout f171a9d878 # Unnecessary except for visitors in the far future
$ ./wart
ready! type in an expression, then hit enter twice. ctrl-d exits.
load "z.wart"
(1 2 1 2 1)
1
2
...
2. You mentioned a couple of possible notations for cycles in lists. Here's my suggestion:
(1 2 ...) # means (1 2 1 2 ..)
This works especially well because wart uses ... for dotted lists:
(1 ... 2) # read as '1 consed with 2'
(1 2 ...) # read as '1 2 consed with itself'
Finally, you could represent more complex cycles as well:
Both of the things you're talking about look specialized for cycles in flat lists. I think dido's examples could have just as easily used 'scar instead of 'scdr.
arc> (= x '(1 2))
arc> (scar x x)
((...) . 2)
#0=(#0# . 2)
I do like the (1 2 ... (3 4 ...)) notation for cyclic lists though. :)
The Kernel R-1RK has a good approach to treating flat cyclic lists as though they're a usual case. IMO, it means everything that deals with lists is more complicated to explain and arguably doesn't even deal with "lists."
Well, now that I consider it the real reason why I missed these cycle detection algorithms is that I've never thought of conses as being just lists. Sure, they're general enough to represent lists, and binary trees, but with the presence of scar and scdr, they are also capable of representing directed graphs with vertices of at most degree 2. In general, any Arc variable can be considered as a reference to a general directed graph (hash tables in particular can be considered vertices of arbitrary degree). Thus, pretty-printing an Arc variable boils down to traversing the graph it represents, and the traversal order that seems like it makes best sense here is depth-first search.
To do the #0=(#0# . 2) notation though seems like the traversal needs to be done twice, first to get the numeric assignments, and next to actually print them. Not sure if it can be done in one pass without making a tree of prettyprint fragments. A marshalling algorithm will need to do it that way though, I should think.
You got me rereading R-1RK, and I was reminded again of how much I like the way Shutt connects up his design decisions to design goals and constraints. It's very in the spirit of Christopher Alexander (http://www.amazon.com/Notes-Synthesis-Form-Harvard-Paperback...).
In spite of the level of detail, kernel's design goals diverge so quickly from my own that I can't reuse any of the design work. It's really too bad.
Today's frustrating example: I really couldn't give a rat's ass that Kernel permits cyclical argument lists in operatives but not applicatives. Then in A.4 he talks about what's missing before R0RK, and the support for cycles feels pedantic next to the problem of a fast implementation. If he wasn't so concerned about handling cycles during eval perhaps the code wouldn't be slow. How the heck can a language be slower than wart?!
I'd assumed I could apply Floyd's hare to handle cars as well, but you're right, now I'm not so sure. The key difficulty is that there's two dimensions, so does the number of hares double each iteration?
Here's a flat-cycle-detecting iso I've been playing with:
def (iso a b)
((afn ((a harea|through) (b hareb|through))
(if ~list?.a
(a = b)
(addr.a = addr.harea)
(addr.b = addr.hareb)
:else
(and (iso car.a car.b) # Can't detect cycles through car.
(self `(,cdr.a :through ,cdr+cdr.harea)
`(,cdr.b :through ,cdr+cdr.hareb)))))
(list a :through cdr.a)
(list b :through cdr.b))
"The key difficulty is that there's two dimensions, so does the number of hares double each iteration?"
Hmm. As the tortoise goes down branches, its location becomes indeterminate. As the hare goes down branches, its location becomes indeterminate relative to the tortoise, so we might need multiple hare markers per branch if we're doing a tortoise-directed search. If we try using the hare to direct the search instead, then we may still need to keep a growing history of hare locations on each branch so the tortoise can follow.
Either way, keeping a hash table sounds more straightforward to me.
Are you using this algorithm properly? I haven't seen any of your example code do a second pass to find out where the cycle begins and what its period is. But maybe you don't need to...?
I'm just taking a constant-factor hit :) A few extra iterations doesn't change the result for either print or iso. And for cycles that loop back all the way to the start it turns out there's zero overhead.
x <- '(1 2 3)
lastcdr.x <- x # 1 2 3 1 2 3 1 ...
print x
1
2
3
... # exactly one iteration
Here's a slightly different bug I found before your update. It's another case where I don't know what behavior you expect, but it probably isn't this. :-p
(do (a <- '(1 2 3)) (lastcdr.a <- cdr.a) nil)
=> nil
(do (b <- '(1 2 3 2 3)) (lastcdr.b <- cdr.b) nil)
=> nil
(not+not+iso a b)
=> nil
(not+not+iso b a)
=> 1
(I'm using not+not+iso so it returns a predictable value rather than an address.)
Interesting. I'd been looking around for cycle detection algorithms and knew I might have forgotten a simpler algorithm. I think that would do very nicely for the pretty printer and most likely the marshalling algorithm as well. We'll see if I can find a way to adapt it to work for iso as well, though on the face of it, it doesn't look that straightforward.
As it happens I provided an implementation of iso at http://arclanguage.org/item?id=17365 :) Let me know if my toy language isn't clear. The basic idea is that you need two hare pointers, one for each arg.