1. Using a hash table for cycle detection can get prohibitive for large cycles. Fortunately there's Floyd's algorithm which detects cycles without the space overhead: https://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_h.... Here's how you might print an infinite list, one item to a line:

-- z.wart
def (print x hare)
if ~list?.x
prn x
(addr.x = addr.hare) # like arc's is
prn '...
(do (prn car.x)
(print cdr.x (cdr+cdr (or hare x)))) # + is compose
# create a cycle
(<- x '(1 2)
lastcdr.x x)
prn (firstn 5 x)
print x

To run it:

$ git clone http://github.com/akkartik/wart
$ cd wart
$ git checkout f171a9d878 # Unnecessary except for visitors in the far future
$ ./wart
ready! type in an expression, then hit enter twice. ctrl-d exits.
load "z.wart"
(1 2 1 2 1)
1
2
...

2. You mentioned a couple of possible notations for cycles in lists. Here's my suggestion:

(1 2 ...) # means (1 2 1 2 ..)

This works especially well because wart uses ... for dotted lists:

(1 ... 2) # read as '1 consed with 2'
(1 2 ...) # read as '1 2 consed with itself'

Finally, you could represent more complex cycles as well:

Both of the things you're talking about look specialized for cycles in flat lists. I think dido's examples could have just as easily used 'scar instead of 'scdr.

arc> (= x '(1 2))
arc> (scar x x)
((...) . 2)
#0=(#0# . 2)

I do like the (1 2 ... (3 4 ...)) notation for cyclic lists though. :)

The Kernel R-1RK has a good approach to treating flat cyclic lists as though they're a usual case. IMO, it means everything that deals with lists is more complicated to explain and arguably doesn't even deal with "lists."

Well, now that I consider it the real reason why I missed these cycle detection algorithms is that I've never thought of conses as being just lists. Sure, they're general enough to represent lists, and binary trees, but with the presence of scar and scdr, they are also capable of representing directed graphs with vertices of at most degree 2. In general, any Arc variable can be considered as a reference to a general directed graph (hash tables in particular can be considered vertices of arbitrary degree). Thus, pretty-printing an Arc variable boils down to traversing the graph it represents, and the traversal order that seems like it makes best sense here is depth-first search.

To do the #0=(#0# . 2) notation though seems like the traversal needs to be done twice, first to get the numeric assignments, and next to actually print them. Not sure if it can be done in one pass without making a tree of prettyprint fragments. A marshalling algorithm will need to do it that way though, I should think.

You got me rereading R-1RK, and I was reminded again of how much I like the way Shutt connects up his design decisions to design goals and constraints. It's very in the spirit of Christopher Alexander (http://www.amazon.com/Notes-Synthesis-Form-Harvard-Paperback...).

In spite of the level of detail, kernel's design goals diverge so quickly from my own that I can't reuse any of the design work. It's really too bad.

Today's frustrating example: I really couldn't give a rat's ass that Kernel permits cyclical argument lists in operatives but not applicatives. Then in A.4 he talks about what's missing before R0RK, and the support for cycles feels pedantic next to the problem of a fast implementation. If he wasn't so concerned about handling cycles during eval perhaps the code wouldn't be slow. How the heck can a language be slower than wart?!

I'd assumed I could apply Floyd's hare to handle cars as well, but you're right, now I'm not so sure. The key difficulty is that there's two dimensions, so does the number of hares double each iteration?

Here's a flat-cycle-detecting iso I've been playing with:

def (iso a b)
((afn ((a harea|through) (b hareb|through))
(if ~list?.a
(a = b)
(addr.a = addr.harea)
(addr.b = addr.hareb)
:else
(and (iso car.a car.b) # Can't detect cycles through car.
(self `(,cdr.a :through ,cdr+cdr.harea)
`(,cdr.b :through ,cdr+cdr.hareb)))))
(list a :through cdr.a)
(list b :through cdr.b))

"The key difficulty is that there's two dimensions, so does the number of hares double each iteration?"

Hmm. As the tortoise goes down branches, its location becomes indeterminate. As the hare goes down branches, its location becomes indeterminate relative to the tortoise, so we might need multiple hare markers per branch if we're doing a tortoise-directed search. If we try using the hare to direct the search instead, then we may still need to keep a growing history of hare locations on each branch so the tortoise can follow.

Either way, keeping a hash table sounds more straightforward to me.

Are you using this algorithm properly? I haven't seen any of your example code do a second pass to find out where the cycle begins and what its period is. But maybe you don't need to...?

I'm just taking a constant-factor hit :) A few extra iterations doesn't change the result for either print or iso. And for cycles that loop back all the way to the start it turns out there's zero overhead.

x <- '(1 2 3)
lastcdr.x <- x # 1 2 3 1 2 3 1 ...
print x
1
2
3
... # exactly one iteration

Here's a slightly different bug I found before your update. It's another case where I don't know what behavior you expect, but it probably isn't this. :-p

(do (a <- '(1 2 3)) (lastcdr.a <- cdr.a) nil)
=> nil
(do (b <- '(1 2 3 2 3)) (lastcdr.b <- cdr.b) nil)
=> nil
(not+not+iso a b)
=> nil
(not+not+iso b a)
=> 1

(I'm using not+not+iso so it returns a predictable value rather than an address.)

Interesting. I'd been looking around for cycle detection algorithms and knew I might have forgotten a simpler algorithm. I think that would do very nicely for the pretty printer and most likely the marshalling algorithm as well. We'll see if I can find a way to adapt it to work for iso as well, though on the face of it, it doesn't look that straightforward.

As it happens I provided an implementation of iso at http://arclanguage.org/item?id=17365 :) Let me know if my toy language isn't clear. The basic idea is that you need two hare pointers, one for each arg.