Arc Forum | Why so proper, alist?

Arc Forum

Why so proper, alist?

4 points by evanrmurphy 5278 days ago | 41 comments

arc3.1's association lists are presumed to have the following contour:

  '((key1 val1) (key2 val2))

But why not represent each association using one pair instead of two?

  '((key1 . val) (key2 . val2))

It's more efficient and more isomorphic to a hash table. Is there any downside?

---

alref doesn't like it:

  arc> (alref '((a . 1) (b . 2)) 'a)
  Error: "Can't take car of 1"

Though assoc doesn't mind!

  arc> (assoc 'a '((a . 1) (b . 2)))
  (a . 1)

Of course, it's easy to redefine alref for this purpose:

  - (def alref (al key) (cadr (assoc key al)))
  + (def alref (al key) (cdr (assoc key al)))

  arc> (alref '((a . 1) (b . 2)) 'a)
  1

4 points by shader 5277 days ago | link

Here's a rather radical question: Why do lists need to be nil terminated at all? Why couldn't we have the normal (a b) notation mean (a . b), and (a b c) mean (a . (b . c)), etc. Then redefine 'car and 'cdr so that they handle atoms: car of atom is itself, and cdr of atom is nil. That way car and cdr never fail on atoms, and map etc. still terminate properly on lists.

The only bad side effect would be lack of explicit rest args; all functions would by default have a rest arg. However, since lists are no longer specially terminated, the last arg wouldn't necessarily be wrapped in a list if the function was called with the proper number of args.

I know I'm questioning half a century of lisp convention; what painfully obvious flaw am I missing here?

-----

2 points by rocketnia 5277 days ago | link

I've actually heard that suggestion before, somewhere. * searches for it*

Oh, it's a misinterpretation I had of something on the newLISP page:

http://www.newlisp.org/index.cgi?page=Differences_to_Other_L...

  ;; Common Lisp and Scheme
  (cons 'a 'b) => (a . b)   ; a dotted pair
  
  [a | b]
  
  ;; newLISP
  (cons 'a 'b) => (a b)     ; a list
  
  [ ]
   \ 
   [a] -> [b]

Now that I read it again, it's totally not referring to the notation. What it means is most clearly stated at http://www.newlisp.org/downloads/newlisp_manual.html#nil_and...:

The cons of two atoms in newLISP does not yield a dotted pair, but rather a two-element list. [...] There is no dotted pair in newLISP because the cdr (tail) part of a Lisp cell always points to another Lisp cell and never to a basic data type, such as a number or a symbol.

So it's just a matter of not having programmer-managed cons cells. Nothing to see here....

---

Back to your idea, what happens if you have the syntax (a b c (d e f))? Isn't that indistinguishable from (a b c d e f)? Would alists have to be explicitly written as ((a 1) (b 2) (c 3) nil)?

-----

1 point by shader 5277 days ago | link

Good question.

I suppose that a list at the end of a list would have to be paired with nil, so (a b c (d e f)) would actually be the list (a b c (d e f) nil), but that should be possible to handle at the reader/'list level, and wouldn't make much of a difference in the use of the code. If that simple change is made, then your alist example shouldn't need to be explicitly terminated.

-----

1 point by akkartik 5277 days ago | link

If we're relying on reader magic, what's the benefit of making atoms rather than lists nil-terminated?

It seems kinda futzy for the last element to be nil sometimes but not others.

-----

1 point by shader 5277 days ago | link

Yes, it does seem kind of fuzzy. And I doubt it would actually be the reader doing that, more likely the 'list function would be the source for the extra nils

However, you need some way to distinguish between a list in the car, and the next element in a chain. The only way to do that is to have it be in the car of a cons cell, and the only way to do that is to have something else in the cdr. If you don't have anything after the list, you have to fill it with nil.

-----

5 points by rocketnia 5277 days ago | link

The only way to do that is to have it be in the car...

Actually, you could have a special symbol '! in the car in order to identify that the cdr is actually supposed to be the last element. Then (a b c (d e f)) is represented as (a . (b . (c . (! . (d . (e . f)))))), i.e. (a b c ! d e f). Banged lists. ;)

In fact, having the symbol be a colon might even make it a nice optional style:

  (accum acc
    (each x args
      (acc (* 2 x))))
  
  (accum acc :
     each x args :
       acc : * 2 x)    ; reminiscent of Arc's (acc:* 2 x)

It's just that you never get to observe the ': from within code. There might be gotchas when generating code, too: The expression (quote : ) would evaluate to nil, huh? XD

-----

3 points by evanrmurphy 5277 days ago | link

Dear rocketnia,

Some of your ideas are very bizarre. I mean this as a compliment. Please keep 'em coming. :)

Regards, evanrmurphy

-----

3 points by evanrmurphy 5277 days ago | link

Very interesting and radical indeed! ^_^

This is reminiscent of a comment at the bottom of arc.arc:

  ; solution to the "problem" of improper lists: allow any atom as a list
  ;  terminator, not just nil.  means list recursion should terminate on 
  ;  atom rather than nil, (def empty (x) (or (atom x) (is x "")))

It also reminds me of a feature in PicoLisp: symbols evaluate to nil by default instead of raising an undefined variable error.

-----

3 points by akkartik 5277 days ago | link

Scheme flags an error on car/cdr of nil. Common lisp doesn't. Now you're suggesting doing this for all atoms. I like this because it loosens another constraint I hadn't even noticed. (http://arclanguage.org/item?id=13414)

-----

2 points by shader 5272 days ago | link

I think in the end just extending car/cdr to support atoms would still be useful, but changing lists to being nil terminated only some of the time would likely be too inconsistent and/or buggy to risk. It would save some space, but on the whole doesn't really improve much.

-----

2 points by evanrmurphy 5278 days ago | link

I suspect that this change to association lists, together with functional position lookups [1], destructuring-bind and a judicious use of conses, could potentially eliminate the need for tables in Arc's core and:

- solve the optional arg problem [2]

- permit apply to be subsumed by the dot notation [3]

I still have a lot of details to work out before I can make a compelling case for this, though.

---

[1] http://arclanguage.org/item?id=10786

[2] http://arclanguage.org/item?id=13029

[3] http://arclanguage.org/item?id=13280

-----

1 point by rocketnia 5278 days ago | link

I don't see the point. o.o I'd much prefer to write '((a 1) (b 2)) rather than '((a . 1) (b . 2)) and destructure using (let (k v) ...) rather than (let (k . v) ...). Actually, I'd write '((a 1) (b 2)) as (objal a 1 b 2), but the destructuring issue is something I'd just deal with and fume over. :-p

Is there any downside?

Those are the downsides. :)

It's more efficient and more isomorphic to a hash table.

I think you save about 1/3 the conses when creating and adding to alists, so there is that.

But isomorphic to a hash table? The most official way we can compare them is with 'tablist and 'listtab, which use the list-of-two-element kind of alist.

Also, IIRC, Rainbow displays tables as #hash((a 1) (b 2)), and I couldn't be happier. There's so much " . nil" cruft when viewing big tables in official Arc.

could potentially eliminate the need for tables in Arc's core

Arc doesn't have enough table support. XP Keys are compared via Racket 'equal? (or via weirder methods in Rainbow and Jarc), and I haven't gone to the trouble to make tables that somehow dispatch via an extensible 'iso.

I want efficient lookup in big tables for the sake of Lathe's namespace system and Penknife's environments, and I'll get that by dropping to the underlying platform if I need to--I already do for weak tables--but I'd rather not. If official Arc ever removes table support, I hope it also adds 'defcall so I can put tables back in.

- solve the optional arg problem

- permit apply to be subsumed by the dot notation

How are those related? The only point of connection I see is that they're other things that could use dotted lists, but even that's not especially true for optional args. Did you mean to say that you suspect some change regarding dotted lists (or just the way we look at them) will help with both alists and these other cases?

-----

2 points by evanrmurphy 5278 days ago | link

> But isomorphic to a hash table?

I'm talking about the core notion of a hash table. It's composed of key-value pairs, not key-value "lists of two". :P This is a restatement of bogomipz's point made elsewhere in this thread.

> Arc doesn't have enough table support.

If alists were better supported, you could use them in place of tables in every case except where the utmost efficiency is required.

But I'm not sure it's even correct to frame this as an axioms vs. efficiency debate. Something I've learned from PicoLisp is that heterogeneous data structures slow down the general case by complicating memory allocation and garbage allocation. PicoLisp manages to be a fast interpreter (say what?), in part because it uses the cons cell for everything [1].

> I'd much prefer to write '((a 1) (b 2)) rather than '((a . 1) (b . 2)) and destructure using (let (k v) ...) rather than (let (k . v) ...).

I think this is a cosmetic issue that has to do only with our visual representation of cons pairs and Arc's incumbent ssyntax.

For example, if you changed the ssyntax so that a.b expanded to (a . b) instead of (a b), then these snippets would be more pleasant to write: '(a.1 b.2) and (let k.v ...) . I'm not actually proposing this particular solution, but it should illustrate my point that the issue is only syntactic/cosmetic.

> How are those related? [...] Did you mean to say that you suspect some change regarding dotted lists (or just the way we look at them) will help with both alists and these other cases?

Well I did say I still have some details to work out. ;)

I think your paraphrase is accurate. A "change regarding dotted lists (or just the way we look at them)" is what I was trying to express with "judicious use of conses" in the grandparent.

---

[1] http://software-lab.de/doc/ref.html#cell

-----

1 point by rocketnia 5278 days ago | link

IMO, a pair is a list of two. What would you propose using for a triple or a singleton?

-----

1 point by evanrmurphy 5278 days ago | link

A pair is a list of two in English because English lists aren't nil-terminated. But Arc lists are, so we're talking about the difference between (key . val) and (key . (val . nil)).

I don't have a great answer to your triple/singleton question yet except to ask that you consider the following:

- The fundamental data structure of lisp is the cons pair, so perhaps pairs warrant some special treatment over singletons, triples, etc.

- The demand for associative arrays in general-purpose programming is far greater than that for any kind of triple-based data structure, which is why tables have their own type in Arc to begin with

Update: Cons pairs are so powerful that we've used them as the base for almost our entire language. And yet the associative array structure (which screams "pair"!) that we've made from them (i.e. alists) is so inadequate that we all outsource that functionality to tables instead. Around tables we've then developed the conveniences for syntax, etc.... Doesn't this seem a bit kludgy for The Hundred-year Language?

-----

2 points by rocketnia 5278 days ago | link

The main advantage of cons pairs, in my mind, is that they're all the same size, so it's easier to reason about them and memory-manage them on a low level. They're also just as powerful as they need to be to support an expressive language. But that doesn't make them ideal abstractions for exploratory programming, especially when an equivalent abstraction in the same language takes fewer characters to type out and is even better supported thanks to 'map, 'any, etc.

-----

1 point by evanrmurphy 5278 days ago | link

Yes, that makes sense. I may have gone somewhat overboard / overly dramatic in this subthread. :) I think I mostly just want alists to be more convenient. Need to think about this more...

-----

1 point by rocketnia 5277 days ago | link

I've been overly dramatic here too. I mostly wanted to help you make sure you were on a path that held water while giving you some hooks to convince me by... but I brought some external pet peeves into the mix and got worked up. XP Please do continue with your train of thought. ^^ Here's hoping the train mixes underwater hooks, or something.

-----

1 point by evanrmurphy 5277 days ago | link

Awesome, thank you. :)

-----

2 points by evanrmurphy 5277 days ago | link

It's something about Arc's built-in types that bothers me. They seem so adhoc. You have this beatiful axiomatic thing going on in the core with conses, and then suddenly tables enter the mix. From that point forward, odd utilities get defined with an if branch that checks for the table type.

In this thread, I've been worried about tables cluttering the core language and you about them not being well-supported enough. In truth, I think both of our concerns are legitimate (yours is for sure, because tables really are better than alists for some applications). The problem is that the present implementation doesn't do either of them justice.

I'd like to know what you think of this proposal: keep the core language definitions to symbols and conses. Then support each additional type in a dedicated file (e.g. numbers.arc, tables.arc, queues.arc). These types can either reach down into Racket to borrow one of its types (likely for numbers or tables) or be annotated constructs built from existing types (likely for queues, trees or alists), and then use the extend idiom to give them support in the various utilities and the reader.

-----

2 points by akkartik 5277 days ago | link

"support each additional type in a dedicated file.. either reach down into Racket to borrow one of its types or be annotated constructs built from existing types, and then use the extend idiom to give them support in the various utilities.."

or defgeneric? 8-) I was moved by the same concerns you describe: I never want to see an (if (isa x 'table) ..) in arc code.

-----

2 points by rocketnia 5277 days ago | link

Agreed with both of you, but I'd go further: I don't want to see (isa x 'cons) or (isa x 'sym) either, if possible. I'd rather every type be treated as equally non-fundamental. Of course, s-expression syntax special-cases those types sorta intrinsically, but I'm sure 'defgeneric could be used along with one of aw's "access the Arc compiler from Arc" patches. ^_^

It might be difficult and/or impossible though, considering that 'defgeneric needs to be defined in terms of something. So does calling, since the most obvious way to specify a custom calling behavior is to give that behavior as a function to call! XD

Like so many other opinions of mine, this is something that's going into Penknife if at all possible, even if the core currently needs a bunch of rewriting to get it to work.

To fix the "'defgeneric needs to be defined in terms of something" issue, I'm currently considering having most things be built-in rulebooks, with rulebook being a built-in type if necessary.

For the calling issue, I'm going to have the built-in call behavior try certain hardwired things first, and only move on to the customizable calling rulebook if those don't work. I intend for it to be possible to replace the interaction environment with one that uses a different call behavior, so even that hardwired-ness should be kinda seamless with the language.

For now, these things are all hand-wavy, and I'm open to better ideas. ^^

-----

2 points by evanrmurphy 5277 days ago | link

> but I'd go further: I don't want to see (isa x 'cons) or (isa x 'sym) either, if possible. I'd rather every type be treated as equally non-fundamental. Of course, s-expression syntax special-cases those types sorta intrinsically

Wow, I'm really interested in whether there's a way to have s-expressions that don't special-case conses and symbols. shader's just-in suggestion [1] makes me think there might be a way to merge conses and symbols into a single type, though. Could it be possible?

---

[1] http://arclanguage.org/item?id=13438

-----

2 points by rocketnia 5277 days ago | link

Essentially, all you need to do is extend 'ac, since compiling is almost all that happens to Arc expressions. In the short term, there's no need to worry about whether a custom type represents a function call, a literal, etc. As long as it compiles, you can start returning it from macros or reader syntaxes.

In the long term, there may be other things that would be useful to extend, like 'ac-macex, 'ac-expand-ssyntax, and 'expand=. Also, it may be easier for a custom syntax type to support 'expand= if there's a separate utility it can extend in order to have all the functionality of a function call. That way it can be 'sref'ed.

-----

2 points by evanrmurphy 5277 days ago | link

Thanks for this guide. It should come in handy for me. :)

If I start messing around with Arc's internals too hard though, I may not be able to resist trying to turn it into an interpreter [1]. I'm too attracted to the notion of first-class environments, eval and fexprs lately. (In this case, I'd be extending eval rather than ac, correct?)

Or maybe I should just stop being such a damn purist. Have to take things one step at a time anyway. ac is a logical place to start.

---

[1] http://arclanguage.org/item?id=13323

-----

1 point by rocketnia 5276 days ago | link

Thanks for this guide. It should come in handy for me. :)

Well, I hope it actually works. :-p

If I start messing around with Arc's internals too hard though, I may not be able to resist trying to turn it into an interpreter.

Yeah. I would have just turned it into Penknife. >.>

I'm too attracted to the notion of first-class environments, eval and fexprs lately. (In this case, I'd be extending eval rather than ac, correct?)

Sure, but there's no interpreting 'eval to build on in Arc (unless you repurpose the macroexpander XD ). I'd find it easiest to approach by building it from scratch--hence kernelish.arc.

Or maybe I should just stop being such a damn purist. Have to take things one step at a time anyway. ac is a logical place to start.

It's all up to whatever you can figure out how to build on, I think.

Also, I'd tell you not to write off purism so quickly, but unfortunately I only like purism in irrational way. ^_^;

-----

3 points by akkartik 5276 days ago | link

Go for the interpreter :)

BTW, remember eight? http://arclanguage.org/item?id=10719

-----

1 point by evanrmurphy 5276 days ago | link

It's come under my radar before [1]. I've read some of the thread you linked to and some of what's on his github [2]. I like the general idea of giving ' and , more power to control evaluation, but I'm afraid I don't grok the language very well yet. :-/

Update: To clarify my confusion, the documentation talks a lot about closures (e.g. that ' does some kind of closure-wrapping), but I thought the language was supposed to be fexpr-based. I don't understand yet what fexprs have to do with closure-wrapping, but I really should study the language more closely.

---

[1] rocketnia referenced it in http://arclanguage.org/item?id=11882, alongside kernel

[2] https://github.com/diiq/eight

-----

3 points by diiq 5276 days ago | link

Eight's documentation is in a terrible state (in part because there are still many things about which I've yet to make up my mind), so blame me for any confusion.

Here's the gist: Fexprs, like macros, take expressions as arguments (duh). Those expressions are made up of symbols (duh). Because a fexpr is evaluated at runtime, those symbols may already be bound to values when the fexpr is called. Eight keeps track of which symbol is bound to which value at the place the expression originated (where the programmer wrote it) --- even if you cons expressions together, or chop them into pieces. This eliminates the need for (uniq), but still allows for anaphoric fexprs when symbol-leaking is desired.

When I wrote the docs on github, I called an expression plus any accompanying bindings a 'closure' (even though it wasn't a function). I also didn't know the word 'fexpr'. I've read a few dozen more old lisp papers since then, and hopefully on the next go-round my vocabulary will be much improved.

-----

1 point by evanrmurphy 5276 days ago | link

Some of your documentation is excellent, actually. This page, for example: https://github.com/diiq/eight/wiki/Better-Questions

-----

2 points by shader 5277 days ago | link

"there might be a way to merge conses and symbols into a single type"

Interesting idea. This might help a lot with implementing lisp in strongly typed languages. I suppose atoms could just be cons cells with nil in their cdr slot. The only problem is then how do you get the actual value out of an atom, and what is it?

-----

2 points by rocketnia 5277 days ago | link

This might help a lot with implementing lisp in strongly typed languages.

Don't most of them have option types or polymorphism of some kind? If you've got a really rigid one, at least you can represent every value the lisp as a structure with one element being the internal dynamic type (represented as an integer if necessary) and at least two child elements of the same structure type and one element of every built-in type you'll ever need to manipulate from the lisp (like numbers and sockets). Then you just do manual checks on the dynamic type to see what to do with the rest. :-p

The only problem is then how do you get the actual value out of an atom, and what is it?

I say the programmer never gets the actual value out of the atom. :-p It's just handled automatically by all the built-in functions. However, this does mean the cons cell representation is completely irrelevant to a high-level programmer.

-----

1 point by evanrmurphy 5277 days ago | link

> I suppose atoms could just be cons cells with nil in their cdr slot.

Could they be annotated conses with symbol in the car and value in the cdr (initialized to nil)? nil itself could then be a cons with the nil symbol in the car and nil in the cdr. This should achieve the cons-symbol duality for nil that's usually desired. (Follow-up question: annotate is an axiom, right?)

Warning: May include sloppy thinking.

-----

1 point by akkartik 5277 days ago | link

I don't want to see (isa x 'cons) or (isa x 'sym) either

Totally with you there.

I don't want to get too hung up on 'purity'. It's ok to use tables in the core if you need them for defgeneric or something. It's ok to have a few isas early on. iso is defined as a non-generic bootstrap version in anarki before eventually being overridden, so stuff like that seems fine to me. I just want to move past the bootstrap process as quickly as possible.

-----

1 point by rocketnia 5277 days ago | link

iso is defined as a non-generic bootstrap version in anarki before eventually being overridden

Sure, that's an okay way to go about it. ^_^ Since I'm doing the Penknife core library stuff from the top down right now, I'm just writing things the way I want to write them, trying to determine what axioms I need before the core library is loaded. If the high-level axioms are defined in another lower-level library, that's just fine, but I don't know why I'd bother with that when I can just put them in the Arc part of the core. :-p

-----

1 point by akkartik 5277 days ago | link

Yeah, makes sense. defgeneric comes far earlier in wart as well. iso pretty much doesn't get bootstrapped (it's available but never used)

-----

1 point by rocketnia 5277 days ago | link

That was a pretty fast reply. I just edited in a bunch of stuff you might have missed. :)

EDIT: Oh, and you edited yours too. XD

-----

1 point by evanrmurphy 5277 days ago | link

Ahh defgeneric! I hadn't made the connection, thanks for pointing it out. :) Is this the writeup you'd recommend?

http://arclanguage.org/item?id=11779

I think I had trouble digesting it a few months ago because it depended on so many utilities I was unfamiliar with: vtables, defmethod, pickles (and it compared with extend, which I didn't understand back then :-o ). Giving it another try...

-----

2 points by akkartik 5277 days ago | link

It's in anarki so perhaps it'd be easier to just play with what the different defgenerics (iso, len, ..) there expand to.

https://github.com/nex3/arc/blob/master/arc.arc#L1734

vtables and pickles aren't utilities, just implementation details for defgeneric.

Basically vtables contains a hashtable for each generic function mapping a type to an implementation. "If len gets a string, do this. If it gets a table, do that." The body given to defgeneric sets up vtable entries for a few default types (cons, mainly :), and defmethod lets you add to vtables later.

If the generic function doesn't find an entry in vtables it falls back on searching the pickles table for a procedure to convert that type to cons, before retrying.

Let me know if this makes sense.

(names: I believe vtables comes from C++, and pickle is the python primitive for serialization)

-----

1 point by akkartik 5278 days ago | link

Intriguing. Looking forward to it.

-----

2 points by bogomipz 5278 days ago | link

Agreed. I always thought of associations as pairs, not lists of length 2.

-----