Arc Forum | Cyclical data structures in arcueid

Arc Forum

	Cyclical data structures in arcueid (arcueid-arc.org)
	2 points by akkartik 4836 days ago \| 18 comments

2 points by Pauan 4835 days ago | link

My solution to this (while breaking compatibility with Arc) is to just not have mutable cons cells. I only see two legitimate uses for mutable cons cells:

1) Speed. But honestly, if you're micro-optimizing your Arc code by mutating conses...

2) Intentionally infinite lists, like akkartik's "nils":

  (= nils (list nil))
  (= cdr.nils nils)

I'm not concerned about the first case, especially because a sufficiently smart compiler should be able to handle immutable cons cells reasonably (I believe Racket does).

As for the second case... I think that's solved by adding in a more general system for dynamically generating lazy lists. Then akkartik's "nils" can be written like this:

  (= nils
    ((afn ()
       (lazy-cons (fn () nil)
                  (fn () (self))))))

Basically, lazy-cons takes two thunks: the first is called to extract the car, and the second to extract the cdr. This lets you dynamically create all kinds of nice things, like streams, and it isn't limited to constant values like cyclic lists are.

Thus, you get all the benefits of immutable conses and a strictly more powerful system for lazy list generation. The only possible downside is some performance, which I don't think would be a big deal.

Of course, I already know you're going for full Arc compatibility, so this post is probably useless, but oh well.

-----

2 points by dido 4833 days ago | link

Removing mutating cons cells would only solve part of the problem. One would also need to essentially eliminate hash tables, and practically all mutating state. And no, I never understood how monads worked. XD

And yes, the ultimate goal I have for Arcueid is to make a fully compatible Arc implementation, so breaking this sort of compatibility is out of the question as far as this project is concerned, but it is interesting to consider.

It would, for one, make real-time garbage collection a lot simpler...

-----

1 point by akkartik 4835 days ago | link

I'm flattered that you remember wart's nils :) And why would you imagine I care about compatibility?! Oh, you mean dido.

It's a good question whether you ever need cycles outside of these two scenarios. Hmm..

-----

2 points by Pauan 4834 days ago | link

"Oh, you mean dido."

Yes, my post was directed at dido.

---

"It's a good question whether you ever need cycles outside of these two scenarios."

I suppose there might be some algorithms that work better with mutable conses, but honestly, I think conses work best when they're functional, since they have a recursive definition. Making them mutable muddles a lot of different things with no real gain because you rarely mutate conses in Arc.

-----

2 points by akkartik 4836 days ago | link

Hi dido, I have 2 comments:

1. Using a hash table for cycle detection can get prohibitive for large cycles. Fortunately there's Floyd's algorithm which detects cycles without the space overhead: https://en.wikipedia.org/wiki/Cycle_detection#Tortoise_and_h.... Here's how you might print an infinite list, one item to a line:

  -- z.wart
  def (print x hare)
    if ~list?.x
         prn x
       (addr.x = addr.hare)  # like arc's is
         prn '...
       (do (prn car.x)
           (print cdr.x (cdr+cdr (or hare x))))  # + is compose

  # create a cycle
  (<- x  '(1 2)
      lastcdr.x  x)

  prn (firstn 5 x)
  print x

To run it:

  $ git clone http://github.com/akkartik/wart
  $ cd wart
  $ git checkout f171a9d878    # Unnecessary except for visitors in the far future
  $ ./wart
  ready! type in an expression, then hit enter twice. ctrl-d exits.
  load "z.wart"
  (1 2 1 2 1)
  1
  2
  ...

2. You mentioned a couple of possible notations for cycles in lists. Here's my suggestion:

  (1 2 ...)    # means (1 2 1 2 ..)

This works especially well because wart uses ... for dotted lists:

  (1 ... 2)   # read as '1 consed with 2'
  (1 2 ...)   # read as '1 2 consed with itself'

Finally, you could represent more complex cycles as well:

  (1 2 ... (3 4 ...))    # (1 2 3 4 3 4 3 ..)

-----

2 points by rocketnia 4836 days ago | link

Both of the things you're talking about look specialized for cycles in flat lists. I think dido's examples could have just as easily used 'scar instead of 'scdr.

  arc> (= x '(1 2))
  arc> (scar x x)
  ((...) . 2)
  #0=(#0# . 2)

I do like the (1 2 ... (3 4 ...)) notation for cyclic lists though. :)

The Kernel R-1RK has a good approach to treating flat cyclic lists as though they're a usual case. IMO, it means everything that deals with lists is more complicated to explain and arguably doesn't even deal with "lists."

-----

2 points by dido 4833 days ago | link

Well, now that I consider it the real reason why I missed these cycle detection algorithms is that I've never thought of conses as being just lists. Sure, they're general enough to represent lists, and binary trees, but with the presence of scar and scdr, they are also capable of representing directed graphs with vertices of at most degree 2. In general, any Arc variable can be considered as a reference to a general directed graph (hash tables in particular can be considered vertices of arbitrary degree). Thus, pretty-printing an Arc variable boils down to traversing the graph it represents, and the traversal order that seems like it makes best sense here is depth-first search.

To do the #0=(#0# . 2) notation though seems like the traversal needs to be done twice, first to get the numeric assignments, and next to actually print them. Not sure if it can be done in one pass without making a tree of prettyprint fragments. A marshalling algorithm will need to do it that way though, I should think.

-----

2 points by akkartik 4835 days ago | link

You got me rereading R-1RK, and I was reminded again of how much I like the way Shutt connects up his design decisions to design goals and constraints. It's very in the spirit of Christopher Alexander (http://www.amazon.com/Notes-Synthesis-Form-Harvard-Paperback...).

In spite of the level of detail, kernel's design goals diverge so quickly from my own that I can't reuse any of the design work. It's really too bad.

Today's frustrating example: I really couldn't give a rat's ass that Kernel permits cyclical argument lists in operatives but not applicatives. Then in A.4 he talks about what's missing before R0RK, and the support for cycles feels pedantic next to the problem of a fast implementation. If he wasn't so concerned about handling cycles during eval perhaps the code wouldn't be slow. How the heck can a language be slower than wart?!

-----

1 point by akkartik 4835 days ago | link

Yes, the print example was just an illustration. It was also a good test for wart; I caught several bugs :) https://github.com/akkartik/wart/compare/686828edba...0e2388.... The tests are pretty self-contained.

I'd assumed I could apply Floyd's hare to handle cars as well, but you're right, now I'm not so sure. The key difficulty is that there's two dimensions, so does the number of hares double each iteration?

Here's a flat-cycle-detecting iso I've been playing with:

  def (iso a b)
    ((afn ((a harea|through) (b hareb|through))
      (if ~list?.a
            (a = b)
          (addr.a = addr.harea)
            (addr.b = addr.hareb)
          :else
            (and (iso car.a car.b)   # Can't detect cycles through car.
                 (self `(,cdr.a :through ,cdr+cdr.harea)
                       `(,cdr.b :through ,cdr+cdr.hareb)))))
      (list a :through cdr.a)
      (list b :through cdr.b))

-----

2 points by rocketnia 4832 days ago | link

"The key difficulty is that there's two dimensions, so does the number of hares double each iteration?"

Hmm. As the tortoise goes down branches, its location becomes indeterminate. As the hare goes down branches, its location becomes indeterminate relative to the tortoise, so we might need multiple hare markers per branch if we're doing a tortoise-directed search. If we try using the hare to direct the search instead, then we may still need to keep a growing history of hare locations on each branch so the tortoise can follow.

Either way, keeping a hash table sounds more straightforward to me.

Are you using this algorithm properly? I haven't seen any of your example code do a second pass to find out where the cycle begins and what its period is. But maybe you don't need to...?

-----

1 point by akkartik 4832 days ago | link

I'm just taking a constant-factor hit :) A few extra iterations doesn't change the result for either print or iso. And for cycles that loop back all the way to the start it turns out there's zero overhead.

  x <- '(1 2 3)
  lastcdr.x <- x  # 1 2 3 1 2 3 1 ...
  print x
  1
  2
  3
  ...  # exactly one iteration

But:

  x <- '(1 2 3)
  lastcdr.x <- cdr.x   # 1 2 3 2 3 2 3 ...
  print x
  1
  2
  ...

-----

2 points by rocketnia 4832 days ago | link

  You posted:
  
  x <- '(1 2 3)
  lastcdr.x <- cdr.x   # 1 2 3 2 3 2 3 ...
  print x
  1
  2
  ...

Is that the behavior you expect? I would expect "1 2 3 ...", but it looks like the rabbit and the hare meet at the (3 ...) cons and stop.

  1 2 3 2 3 2 3
  *              Print "1".
    * *          Print "2".
      *   *      Print "...".

-----

1 point by akkartik 4832 days ago | link

Yeah I tried it out before posting. I had to take pains in iso to make sure we do one full traversal. With print I didn't care as much.

Update: Ack, I found a bug in iso (http://arclanguage.org/item?id=17365):

  x <- '(1 2 3)
  y <- '(1 2 4)
  (do1 nil (<- lastcdr.x x lastcdr.y y))  # Avoid printing the cycles
  (iso x y)   # not nil!

-----

2 points by rocketnia 4832 days ago | link

Here's a slightly different bug I found before your update. It's another case where I don't know what behavior you expect, but it probably isn't this. :-p

  (do (a <- '(1 2 3)) (lastcdr.a <- cdr.a) nil)
  => nil
  (do (b <- '(1 2 3 2 3)) (lastcdr.b <- cdr.b) nil)
  => nil
  (not+not+iso a b)
  => nil
  (not+not+iso b a)
  => 1

(I'm using not+not+iso so it returns a predictable value rather than an address.)

-----

1 point by rocketnia 4832 days ago | link

"the rabbit and the hare"

Whoops. I didn't mean that. XD

-----

2 points by dido 4833 days ago | link

Interesting. I'd been looking around for cycle detection algorithms and knew I might have forgotten a simpler algorithm. I think that would do very nicely for the pretty printer and most likely the marshalling algorithm as well. We'll see if I can find a way to adapt it to work for iso as well, though on the face of it, it doesn't look that straightforward.

-----

1 point by Pauan 4833 days ago | link

Wikipedia has more algorithms for cycle detection:

https://en.wikipedia.org/wiki/Cycle_detection#Brent.27s_algo...

-----

1 point by akkartik 4833 days ago | link

As it happens I provided an implementation of iso at http://arclanguage.org/item?id=17365 :) Let me know if my toy language isn't clear. The basic idea is that you need two hare pointers, one for each arg.

-----