Arc Forum | Hi dido, I have 2 comments:1. Using a hash table for cycle detection can get pro...

Arc Forum

2 points by akkartik 4516 days ago | link | parent

Hi dido, I have 2 comments:

1. Using a hash table for cycle detection can get prohibitive for large cycles. Fortunately there's Floyd's algorithm which detects cycles without the space overhead: https://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_h.... Here's how you might print an infinite list, one item to a line:

  -- z.wart
  def (print x hare)
    if ~list?.x
         prn x
       (addr.x = addr.hare)  # like arc's is
         prn '...
       (do (prn car.x)
           (print cdr.x (cdr+cdr (or hare x))))  # + is compose

  # create a cycle
  (<- x  '(1 2)
      lastcdr.x  x)

  prn (firstn 5 x)
  print x

To run it:

  $ git clone http://github.com/akkartik/wart
  $ cd wart
  $ git checkout f171a9d878    # Unnecessary except for visitors in the far future
  $ ./wart
  ready! type in an expression, then hit enter twice. ctrl-d exits.
  load "z.wart"
  (1 2 1 2 1)
  1
  2
  ...

2. You mentioned a couple of possible notations for cycles in lists. Here's my suggestion:

  (1 2 ...)    # means (1 2 1 2 ..)

This works especially well because wart uses ... for dotted lists:

  (1 ... 2)   # read as '1 consed with 2'
  (1 2 ...)   # read as '1 2 consed with itself'

Finally, you could represent more complex cycles as well:

  (1 2 ... (3 4 ...))    # (1 2 3 4 3 4 3 ..)

2 points by rocketnia 4515 days ago | link

Both of the things you're talking about look specialized for cycles in flat lists. I think dido's examples could have just as easily used 'scar instead of 'scdr.

  arc> (= x '(1 2))
  arc> (scar x x)
  ((...) . 2)
  #0=(#0# . 2)

I do like the (1 2 ... (3 4 ...)) notation for cyclic lists though. :)

The Kernel R-1RK has a good approach to treating flat cyclic lists as though they're a usual case. IMO, it means everything that deals with lists is more complicated to explain and arguably doesn't even deal with "lists."

-----

2 points by dido 4513 days ago | link

Well, now that I consider it the real reason why I missed these cycle detection algorithms is that I've never thought of conses as being just lists. Sure, they're general enough to represent lists, and binary trees, but with the presence of scar and scdr, they are also capable of representing directed graphs with vertices of at most degree 2. In general, any Arc variable can be considered as a reference to a general directed graph (hash tables in particular can be considered vertices of arbitrary degree). Thus, pretty-printing an Arc variable boils down to traversing the graph it represents, and the traversal order that seems like it makes best sense here is depth-first search.

To do the #0=(#0# . 2) notation though seems like the traversal needs to be done twice, first to get the numeric assignments, and next to actually print them. Not sure if it can be done in one pass without making a tree of prettyprint fragments. A marshalling algorithm will need to do it that way though, I should think.

-----

2 points by akkartik 4515 days ago | link

You got me rereading R-1RK, and I was reminded again of how much I like the way Shutt connects up his design decisions to design goals and constraints. It's very in the spirit of Christopher Alexander (http://www.amazon.com/Notes-Synthesis-Form-Harvard-Paperback...).

In spite of the level of detail, kernel's design goals diverge so quickly from my own that I can't reuse any of the design work. It's really too bad.

Today's frustrating example: I really couldn't give a rat's ass that Kernel permits cyclical argument lists in operatives but not applicatives. Then in A.4 he talks about what's missing before R0RK, and the support for cycles feels pedantic next to the problem of a fast implementation. If he wasn't so concerned about handling cycles during eval perhaps the code wouldn't be slow. How the heck can a language be slower than wart?!

-----

1 point by akkartik 4515 days ago | link

Yes, the print example was just an illustration. It was also a good test for wart; I caught several bugs :) https://github.com/akkartik/wart/compare/686828edba...0e2388.... The tests are pretty self-contained.

I'd assumed I could apply Floyd's hare to handle cars as well, but you're right, now I'm not so sure. The key difficulty is that there's two dimensions, so does the number of hares double each iteration?

Here's a flat-cycle-detecting iso I've been playing with:

  def (iso a b)
    ((afn ((a harea|through) (b hareb|through))
      (if ~list?.a
            (a = b)
          (addr.a = addr.harea)
            (addr.b = addr.hareb)
          :else
            (and (iso car.a car.b)   # Can't detect cycles through car.
                 (self `(,cdr.a :through ,cdr+cdr.harea)
                       `(,cdr.b :through ,cdr+cdr.hareb)))))
      (list a :through cdr.a)
      (list b :through cdr.b))

-----

2 points by rocketnia 4512 days ago | link

"The key difficulty is that there's two dimensions, so does the number of hares double each iteration?"

Hmm. As the tortoise goes down branches, its location becomes indeterminate. As the hare goes down branches, its location becomes indeterminate relative to the tortoise, so we might need multiple hare markers per branch if we're doing a tortoise-directed search. If we try using the hare to direct the search instead, then we may still need to keep a growing history of hare locations on each branch so the tortoise can follow.

Either way, keeping a hash table sounds more straightforward to me.

Are you using this algorithm properly? I haven't seen any of your example code do a second pass to find out where the cycle begins and what its period is. But maybe you don't need to...?

-----

1 point by akkartik 4512 days ago | link

I'm just taking a constant-factor hit :) A few extra iterations doesn't change the result for either print or iso. And for cycles that loop back all the way to the start it turns out there's zero overhead.

  x <- '(1 2 3)
  lastcdr.x <- x  # 1 2 3 1 2 3 1 ...
  print x
  1
  2
  3
  ...  # exactly one iteration

But:

  x <- '(1 2 3)
  lastcdr.x <- cdr.x   # 1 2 3 2 3 2 3 ...
  print x
  1
  2
  ...

-----

2 points by rocketnia 4512 days ago | link

  You posted:
  
  x <- '(1 2 3)
  lastcdr.x <- cdr.x   # 1 2 3 2 3 2 3 ...
  print x
  1
  2
  ...

Is that the behavior you expect? I would expect "1 2 3 ...", but it looks like the rabbit and the hare meet at the (3 ...) cons and stop.

  1 2 3 2 3 2 3
  *              Print "1".
    * *          Print "2".
      *   *      Print "...".

-----

1 point by akkartik 4512 days ago | link

Yeah I tried it out before posting. I had to take pains in iso to make sure we do one full traversal. With print I didn't care as much.

Update: Ack, I found a bug in iso (http://arclanguage.org/item?id=17365):

  x <- '(1 2 3)
  y <- '(1 2 4)
  (do1 nil (<- lastcdr.x x lastcdr.y y))  # Avoid printing the cycles
  (iso x y)   # not nil!

-----

2 points by rocketnia 4512 days ago | link

Here's a slightly different bug I found before your update. It's another case where I don't know what behavior you expect, but it probably isn't this. :-p

  (do (a <- '(1 2 3)) (lastcdr.a <- cdr.a) nil)
  => nil
  (do (b <- '(1 2 3 2 3)) (lastcdr.b <- cdr.b) nil)
  => nil
  (not+not+iso a b)
  => nil
  (not+not+iso b a)
  => 1

(I'm using not+not+iso so it returns a predictable value rather than an address.)

-----

1 point by rocketnia 4512 days ago | link

"the rabbit and the hare"

Whoops. I didn't mean that. XD

-----

2 points by dido 4513 days ago | link

Interesting. I'd been looking around for cycle detection algorithms and knew I might have forgotten a simpler algorithm. I think that would do very nicely for the pretty printer and most likely the marshalling algorithm as well. We'll see if I can find a way to adapt it to work for iso as well, though on the face of it, it doesn't look that straightforward.

-----

1 point by Pauan 4513 days ago | link

Wikipedia has more algorithms for cycle detection:

https://en.wikipedia.org/wiki/Cycle_detection#Brent.27s_algo...

-----

1 point by akkartik 4513 days ago | link

As it happens I provided an implementation of iso at http://arclanguage.org/item?id=17365 :) Let me know if my toy language isn't clear. The basic idea is that you need two hare pointers, one for each arg.

-----