Arc Forum | This is a cool idea, but I'm not sure that it's a good idea in general to add co...

Arc Forum

2 points by nex3 6299 days ago | link | parent

This is a cool idea, but I'm not sure that it's a good idea in general to add complexity to cons cells. Even if the interface is the same, the performance characteristics are different (that's the whole point, I suppose), and that makes reasoning about them more complicated.

Also, from a more wishy-washy perspective, it just feels right to me that the core data structure of Lisp has such an conceptually simple implementation. Two pointers: a car and a cdr. That's it

It seems to me that it's worth just making arrays available (which we'd have to do anyway, really) to keep lists as simple as possible.

Maybe I'm being naïve, though ;-).

3 points by almkglor 6299 days ago | link

Come now. It's always possible to "seamlessly" funge the division between lists and arrays. For the most part, they have virtually the same purpose: to keep a sequence of objects. One can imagine an implementation which keeps lists as arrays underneath, but if something really difficult to do in array - say insertion, or scdr - to switch over to singly-linked lists.

Really, though, I think what's missing in Arc is the layers of abstraction. We don't need two sequences - singly-linked lists and arrays. What we should have is the concept of a sequence. Iterating over a sequence, regardless of its implementation, should be handled by 'each, 'map, etc. I think as pg envisioned it, a profiler would eventually be integrated into Arc, which would measure the performance difference in using either of the implementations. If insertion is common in one place, and then indexed access in another, the compiler+profiler should be able to figure out that it should use linked lists in one place, then copy it over to the other place as an array to be used as indexed access.

Basically, what I'd like is a layer of abstraction - "this thing is a sequence, I'll choose array or linked-list later". Then, after profiling, I'll just add some tags and magically, this part of the code building the array uses linked-lists and that part doing weird indexed stuff (e.g. heap access) uses arrays.

When you look at it, most other dynamic languages don't have a separate linked-list type. They're all just "array", and not all of them use a sequential-cells-of-memory implementation of arrays. Say what you will about lesser languages, but first think - if everyone's doing it, there must be a reason. They could just be copying each other, of course, but one must always consider whether having two types of sequence, whose only true difference is their difference in access, is such an important thing.

-----

4 points by nex3 6299 days ago | link

I don't see lists in Lisp as just sequences. Rather, I don't see cons cells as just sequences. They're so much more versatile than that, which is part of what gives Lisp its power (cue "hundred operations on one data structure" quote). They can be used as maps, trees, etc. I think it would be a mistake to say, "these are mostly the same as arrays, let's implement them as arrays most of the time." Cons cells aren't the same as arrays.

I guess you're right in that arrays have more-or-less a subset of the functionality that cons cells do. Maybe it would be a good idea to have lists as the default and switch to arrays under some circumstances (lots of indexing or index-assignment?). But I'm skeptical about this as well.

Also, I foresee some unexpected behavior if the transition between conc cells and arrays is entirely behind-the-scenes. For example:

  ; Suppose foo is an array acting like a cons cell
  (= bar (cons 'baz (cdr foo)))
  (scdr (cdr bar) 'baz)

Now we'd need to somehow update the foo variable to point to a cons cell rather than array. You could imagine this getting even more tricky, even incurring large unexpected cost, with many variables pointing at different parts of an array-list and one of them suddenly scdr-ing.

-----

2 points by almkglor 6298 days ago | link

All of that solved by the unrolled-list mockup. It works, and works with scdr, no matter how many pointers point in the middle of otherwise-proper lists, and even works with lists split by scdr, handling both the parts before and after the scdr-split seamlessly. Costs are also not very large - mostly the added cost is in the search through cdr-table for the next highest key. That said the added cost is O(n) where n is the number of individual cons cells scdr has been used on.

So yes, the above code you show will add cost. But how often is it used in actual coding anyway? The most common use of cons cells is straight sequences, and quite a bit of arcn can't handle improper lists (quasiquote comes to mind) - yet arc is useful enough to write news.yc.

Edit: Come to think of it, for the common case where scdr completely trashes the old cdr of a cons cell (i.e. no references to it are kept), the linear lookup through cdr-table will still end up being O(1), since keys are sorted, and the earlier cdr-table entry gets found first.

-----