Arc Forum | Ask AF: Advantages of alists?

Arc Forum

	Ask AF: Advantages of alists?
	2 points by kinnard 2618 days ago \| 25 comments
	Are there unique advantages of using alists over other data structures? If so what are some examples of how this shows up in practice?

2 points by i4cu 2618 days ago | link

If you have a small number of key/value pairs and care about order then alists are both efficient and practical, but otherwise I would not give them too much weight.

So an example I can give is found in html.arc. The start-tag function has an alist used for generating tag attributes:

  (def start-tag (spec)
    (if (atom spec)
        `(pr ,(string "<" spec ">"))
        (let opts (tag-options (car spec) (pair (cdr spec)))
          (if (all [isa _ 'string] opts)
              `(pr ,(string "<" (car spec) (apply string opts) ">"))
              `(do (pr ,(string "<" (car spec)))
                   ,@(map (fn (opt)
                            (if (isa opt 'string)
                                `(pr ,opt)
                                opt))
                          opts)
                   (pr ">"))))))

Notice (pair (cdr spec)) is an alist. Now if you wanted to extend start-tag with conditional operations on the tag spec, you could bind (pair (cdr spec)) to say var 'attrs' and use arcs built in functions to inspect and modify as you see fit. As you can see a table is probably not needed when there's only a half-dozen items (at max) plus you would lose the order. A list would prevent you from pairing up to do anything meaningful like an inspection or conditional modification.

It's worth noting that a serious downfall for alists is having no real means to detect a data type, because really it doesn't have one (unlike a table). You could inspect the first item attempting to detect the type that way, but the logic's soundness/efficiency breaks down pretty quickly in all but the simplest cases.

So, again, a small number of pairs where you are confident in the shape of the data and have total control of the data usage (i.e where and how it get's passed around) then they are good, but otherwise you would need a pretty good or nuanced reason for it.

Some of pgs other idea's [1] are: "having several that share the same tail, and preserve old values".

i.e. You could have some function that accumulates pairs (where some or many have the same key, but a different value). This is where you don't want the obvious behaviour a table provides (where the last one added wins), and/or you need previous entries. Noting that you can save an alist to a file, and reload them easily while still having access to the history of values for a given key, thus you're able to re-construct your operations based on historical data. You just can't do that with a table.

1. http://arclanguage.org/item?id=20865

-----

2 points by kinnard 2610 days ago | link

I completely overlooked the "care about order" part.

I realized belatedly that ordering matters for my application. . .

Even looked at Assocative Containers[1] before I thought, "alist!"

[1] https://en.wikipedia.org/wiki/Associative_containers

-----

3 points by i4cu 2610 days ago | link

Yeah, in clojure we have the standard hash-maps, but we also have array-maps (which maintain the insertion order) and sorted-maps (which are sorted by keys).

In the last ten years I've only needed ordered maps once or twice. In one case it was for a custom query language I wrote that generated queries from data alone.

eg. (query db :users (array-map :gender "female" :name "xena"))

so in this example adding the :gender clause first would restrict the query and improve performance.

I also remember, in my early clojure days, I made the mistake of relying on standard hash-maps:

eg. (query db :users {:gender "female" :name "xena"})

Until I discovered a query where the performance died and found out hash-maps are actually array-maps (under the hood) until 9 entries. It then auto converts to real hash-maps and loses its order (Clojure does this because maintaining order on bigger data sets is costly from an efficiency/memory perspective).

In arc I would have to use alists for this.

-----

2 points by kinnard 2610 days ago | link

This sucks because notation, and lookup are so hairy with alists and they get even worse with nesting.

-----

3 points by rocketnia 2610 days ago | link

Can you give an example of where the notation is better with tables than with alists? Maybe there's something you can do about it, e.g. writing a macro, using `defcall` or `defset`, or extending `sref`.

-----

2 points by kinnard 2610 days ago | link

I'm running into trouble with both actually.

  (= tpipe {todo 
              '({id 1 cont "get eggs" done '(nil)}
                {id 23 cont "fix toilet" done '(nil)})
            week '(nil)
            today '({id 23 cont "Build something that works in arc" done '(nil)})
            done '({id 23 cont "Research Ordered Associative Arrays" done '(2019 1 21)})})

  (= apipe '((todo ({id 1 cont "get eggs" done '(nil)} 
                    {id 23 cont "fix toilet" done '(nil)})) 
             (week (nil)) 
             (today ({id 23 cont "Build something that works in arc" done '(nil)}))
             (done ({id 23 cont "Research Ordered Associative Arrays" done (2019 1 21)}))))

  arc> ((alref apipe 'todo) 0)
  '(%braces id 1 cont "get eggs" done '(nil))

  arc> tpipe!todo.1
  '(%braces id 23 cont "fix toilet" done '(nil))

But neither works like so

  arc> tpipe!todo.1!id

  arc> ((alref apipe 'todo) 0) 'id)

One must employ (list) or quasiquotation

    (= npipe {todo 
                (list {id 1 cont "get eggs" done '(nil)}
                      {id 23 cont "fix toilet" done '(nil)})
              week '(nil)
              today (list {id 23 cont "Build something that works in arc" done '(nil)})
              done (list {id 23 cont "Research Ordered Associative Arrays" done '(2019 1 21)})})

    (= unqpipe {todo 
                `(,{id 1 cont "get eggs" done '(nil)}
                  ,{id 23 cont "fix toilet" done '(nil)})
                week '(nil)
                today `(,{id 23 cont "Build something that works in arc" done '(nil)})
                done `(,{id 23 cont "Research Ordered Associative Arrays" done '(2019 1 21)})})

This is less of a problem with tpipe since in practice items would get pushed into a "pipe".

Ideal world with key and index access:

  arc> pipe!todo.1!id
  23

  arc> pipe.0.1!id
  23

EDIT: I think tables might be better overall and that the order-necessitating functionality may not be worth the trouble of alists.

-----

2 points by rocketnia 2610 days ago | link

I recommend not expecting `quote` or `quasiquote` to be very useful outside the context of metaprogramming.

Quotation is helpful for metaprogramming in a tangible way, because we can easily refactor code in ways that move parts of it from being quoted to being unquoted or vice versa.

And quotation is limited to metaprogramming in a tangible way, because we can only quote data that's reasonably maintainable in the same format we're maintaining our other code in. For instance, an Arc `quote` or `quasiquote` operation is itself written inside an Arc source code file, which is plain text, so it isn't very useful for quoting graphics or audio data.

We can of course use other functions or macros to construct those kinds of data. That's essentially Arc's relationship with tables. When we've constructed tables, we've just used (obj ...) and (listtab ...) and such.

Adding tables to the language syntax is doable, but it could have some quirks.

  ; Should this cause an error, or should it result in the same thing as
  ; '(let i 0 `{,++.i "foo"}) or '(let i 0 `{,++.i "foo"})? Each option
  ; is a little surprising, since any slight edit to the code, like
  ; replacing one ++.i with (++ i 1), would give us valid code to
  ; construct a two-entry table, and this code very well might have
  ; arisen from a slight edit in the opposite direction.
  '(let i 0
    `{,++.i "foo" ,++.i "bar"})
  
  ; Should this always result in 2, or should it result in 1 if "foo"
  ; comes last in the table's iteration order?
  (let x 0
    `{"foo" ,(= x 1) "bar" ,(= x 2)}
    x)

Personally, I tend to go the other way: I prefer to have as few kinds of data as possible in a language's quotable syntax.

A macroexpander needs to extract two things from the code at any given time: The name of the next macro to expand, and the region of syntax the macro should operate on. Symbols help encode macro names. Lists and symbols together help encode regions of plain text. I think it's for these reasons that symbols and lists are so essential to Arc's syntax.

Tables would fit particularly well into a language's quotable syntax if they somehow helped encode regions of syntax. For instance, if a macro body consisted of all the files in a directory, then a table could be an appropriate represention of that file collection.

-----

2 points by akkartik 2609 days ago | link

I'm having a lot of trouble parsing this comment.

> I recommend not expecting `quote` or `quasiquote` to be very useful outside the context of metaprogramming.

My immediate reaction is to disagree. A lot of the reason Lisp is so great is that quasiquotation is orthogonal to macros/metaprogramming.

    > ; Should this cause an error, or should it result in the same thing as
    > ; '(let i 0 `{,++.i "foo"}) or '(let i 0 `{,++.i "foo"})?

Those two fragments are the same?

In general it feels unnecessarily confusing to include long doc comments in code fragments here. We're already using prose to describe the code before and after.

Code comments make sense when sharing a utility that you expect readers to copy/paste directly into a file to keep around on their disks. But I don't think that's what you intend here?

Finally, both your examples seem to be more about side effects in literals? That is a bad idea whether it's a table literal or not, and whether it uses quasiquoting or not. Do you have a different example to show the issue without relying on side-effects?

-----

3 points by rocketnia 2608 days ago | link

I've replied separately about why I would say quasiquotation is only useful for code generation. In this reply I'll focus on the topic of the quirks we might have to deal with if we have Arc tables as quasiquotable syntax.

I think they're mostly unrelated topics, but I was using the quirks of tables in `quasiquote` to motivate keeping the number of quasiquotable syntaxes small and focused. Since I believe quotation is essentially only good for code generation (as I explain in more detail in the other reply), my preference is generally to focus the quasiquotable syntaxes on that purpose alone.

---

"In general it feels unnecessarily confusing to include long doc comments in code fragments here. We're already using prose to describe the code before and after."

Sorry, and thanks for the feedback on this.

There's a deeper problem here where my posts can get a bit long, with a lot of asides. :) I thought of those code examples as an aside or a subsection. If you were going to skim over the code, I wanted it to be syntactically easy to skim over the related prose at the same time.

This was something I felt was particularly worth skipping over. Ultimately, the quirks of using tables as syntax are mostly just as easy to put up with as the quirks of using tables for anything else. (I've gone to the trouble to make what I think of as non-quirky tables for Cene, but it's a very elaborate design, and I wouldn't actually expect to see non-quirky tables in Arc.)

Since I was only using these quirks to motivate why `quasiquote` would tend to be focused on code generation, I probably didn't invest enough space to fully explain what the quirks were. I'll try to explain them now....

---

"Those two fragments are the same?"

Whoops, those two fragments were supposed to be '(let i 0 `{,++.i "foo"}) and '(let i 0 `{,++.i "bar"}).

---

"Finally, both your examples seem to be more about side effects in literals? That is a bad idea whether it's a table literal or not, and whether it uses quasiquoting or not. Do you have a different example to show the issue without relying on side-effects?"

I don't know if I'd say the unquoted-key example depends on side effects, but the unquoted-value example very much does. Here it is again:

  (let x 0
    `{"foo" ,(= x 1) "bar" ,(= x 2)}
    x)

The quirk here is that the usual left-to-right evaluation order of Arc can't necessarily be guaranteed for table-based syntax, and if the evaluation order matters for any reason, it must be because of some kind of side effect.

Removing side effects from the language is a great remedy for this, but typically that kind of effort can only go so far. In an untyped language, we usually have to deal with the side effects of run time type errors and nontermination, even if we eliminate everything else:

  `{key1 ,(accidentally-cause-a-run-time-error) key2 ,(loop-forever)}

Even if we commit to programming without any run time errors or nontermination (perhaps enforcing termination with the help of a type system like that of Coq or Agda), we still have some cases like this where the order matters:

  `{key1 ,(compute-with-64TB-of-space) key2 ,(compute-for-800-years)}

A programmer in Arc or Racket might expect this program to reach a space limit relatively soon on machines with less than 64TB of space available, since Arc and Racket guarantee left-to-right evaluation order.

If the programmer actively intends for this program to fail fast, you and I will probably agree they would be better off sequencing the operations a little more explicitly, maybe like this:

  (let val1 (compute-with-64TB-of-space)
    `{key1 ,val1 key2 ,(compute-for-800-years)})

But suppose the programmer doesn't initially realize the program will fail at all. It only crosses their mind when they come back to diagnose bugs in their code, at which point they expect these expressions to evaluate from left to right because that's what Arc and Racket normally guarantee.

That's when they have to realize that the tables in their syntax have gotten in the way of this guarantee.

Simple solution: We clearly document this so people don't expect left-to-right evaluation order in this situation.

Alternative simple solution: We make tables order-preserving so they can be evaluated as expected.

That covers the unquoted-value example.

Now let's consider the unquoted-key example:

  '(let i 0
     `{,++.i "foo" ,++.i "bar"})

In this one, the quirk is that the two occurrences of ,++.i are expressed with the same syntax, so at read time the table would have two identical keys, even though the programmer may expect them to express different behavior.

While it looks like this example depends on side effects (in this case mutation), I'm not so sure it does. Here's an alternative example which shows the same issue without necessarily using side effects:

  '`{,(current-location) "foo" ,(current-location) "bar"}

This involves a hypothetical macro (current-location) which would expand to a string literal describing the filename, line, and column where it was expanded.

Is it a side effect? Maybe not; a file of code that used (current-location) would usually be semantically equivalent to a file that spelled out the same string literal by hand. In a language with separately compiled modules, both files might compile to the same result, which would make that semantic equivalence precise. In such a language, we typically wouldn't have any reason to mind if a module used (current-location) in its source code, even if we preferred to avoid it for some reason in our own code. This makes it into some kind of "safe" side effect, if it's even a side effect at all.

Nevertheless, within a single file, the expression (current-location) could look the same in two places but give different results.

That's where using `unquote` in table keys becomes quirky: The source code of two table keys may look identical (and hence cause a duplicate key conflict at the source code level) even if the programmer thinks of them as being different because they eventually generate different results.

Because of this quirk, the programmer may have to use some kind of workaround, like putting slightly different useless code into each key:

  '`{,(do 1 (current-location)) "foo" ,(do 2 (current-location)) "bar"}

Simple solution: We clearly document this so programmers can use that workaround with confidence. To help make sure programmers are aware of this documentation, we report descriptive errors at read time or at "quasiquotation construction time" if a table would be made with duplicate keys.

Alternative simple solution: We decide never to allow table keys to be unquoted. If a table key appears to be unquoted, the table key actually consists of a list of the form (unquote ...). We still report errors at construction time or read time so programmers don't mistakenly believe `{same-key ,(foo) same-key ,(bar)} will evaluate both expressions (foo) and (bar).

-----

1 point by akkartik 2608 days ago | link

Relying on the order arguments are evaluated in is always going to result in grief. Regardless of programming language. It's one of those noob mistakes that we've all made and learned from. I think we shouldn't be trying to protect people from such mistakes. I'd rather think about how we can get people to make such mistakes faster, so they can more rapidly build up the requisite scar tissue :)

So yes, we should document this, but not just in this particular case of tables. It feels more like something to bring up in the tutorial.

Edit: to be clear, I'm not (yet) supporting Kinnard's original proposal. I haven't fully digested it yet. I'm just responding to your comment in isolation ^_^

-----

2 points by rocketnia 2608 days ago | link

"My immediate reaction is to disagree. A lot of the reason Lisp is so great is that quasiquotation is orthogonal to macros/metaprogramming."

Do you have particular reasons in mind? It sounds like you're reserving those until you understand what I'm saying with my quasiquoted table examples, but I think those examples are mostly incidental to the point I'm making. (I'll clarify them in a separate reply.)

Maybe I can express this again in a different way.

I bet we can at least agree, on a definitional level, that quotation is good for constructing data out of data that's written directly in the code.

I contend quotation is only very useful when it comes to code generation.

If there were ever some kind of data we could quote that we couldn't use as program syntax, then we could just remove the quotation boundary and we'd have a fresh new design for a program syntax, which would bring us back up to parity between quotation and code generation.

In a Lispy language like Arc, usually it's possible to write a macro that acts as a synonym of `quote` itself. That means the set of things that can be passed to macros must be a superset of the things that can be passed to `quote`. Conversely, since all code should be quotable, the set of things that can be passed to `quote` must be a superset of the things passed to macros, so they're precisely the same set.

This time I've made it sound like some abstract property of macro system design, but it doesn't just come up in the design of an axiomatic language core; it comes up in the day-to-day use of the language, too. Quoted lists that don't begin with prefix operators are indented oddly compared to practically all the other lists in a Lispy program. I expect similar issues arise with syntax highlighting. In general, the habits and tooling we use with the language syntax don't treat quasiquoted non-code as a seamless part of the language. So, reserving quasiquotation for actual code generation purposes tends to let it help out in the places it really helps while keeping it out of the places where it causes awkward and distracting editor interactions.

-----

2 points by akkartik 2608 days ago | link

> I bet we can at least agree, on a definitional level, that quotation is good for constructing data out of data that's written directly in the code.

No, I think I disagree there, assuming I'm understanding you correctly.

One common case where I used to use quasiquote was in data migrations, and there was never a macro in sight. I don't precisely remember a real use case involving RSS feeds and user data back in the day, but here's a made-up example.

Say you're running a MMORPG that started out in 2D, but you're now adding a third dimension, starting all players off at an elevation of 0m above sea level. Initially your user data is 2-tuples that look like this:

    (lat long)

Now you want it to look like this:

    (x y z)

..where x is the old latitude and z is the old longitude.

Here are two ways to perform this transform. Using quasiquote:

    (whiler (other-user-data ... (lat long) ...)  (read)  eof
      (prn `(,other-user-data ... (,lat 0.0 ,long) ...)))

And without quasiquote:

    (whiler (other-user-data ... (lat long) ...)  (read)  eof
      (prn (list other-user-data ... (list lat 0.0 long) ...)))

Hopefully that conveys the idea. Maybe the difference doesn't seem large, but imagine the schema gets more complex and more deeply nested. Having lots of `list` and `cons` tokens around is a drag.

I've always thought there's a deep duality between quasiquote and destructuring. Totally independent of macros.

-----

2 points by rocketnia 2608 days ago | link

"No, I think I disagree there, assuming I'm understanding you correctly."

That's interesting.... How would you describe what quotation does, then, if you wouldn't say it lets you write certain data directly in the code?

---

In your data migration example, I notice you're reading and writing the data. You're even putting newlines in it, which suggests you might sometimes view the contents of that written data directly. If you're viewing it directly, it makes sense to want the code that generates it to look similar to what it looks like in that representation.

It's not always feasible for code to resemble data, but since that file is plain text with s-expressions, and since the code that generates it is plain text with s-expressions, it is very possible: First you can pretend they're the exact same language, and then you can use `quasiquote` for code generation.

You might not have thought of it in that order, but I think the cases where `quasiquote` fails to be useful are exactly the cases where it's hard to pretend the generated data is in the same language as the code generating it.

---

"I've always thought there's a deep duality between quasiquote and destructuring."

I've always thought it would be more flexible if the first element of the list were a prefix operation, letting us destructure other things like tables and tagged values.

I built the patmac.arc library to do this:

Current link: https://github.com/rocketnia/lathe/blob/master/arc/patmac.ar...

Posterity link: https://github.com/rocketnia/lathe/blob/7127cec31a9e97d27512...

One of the few things I implemented in patmac.arc was a `quasiquote` pattern that resembles Arc destructuring just like you're talking about.

Racket doesn't need a library like patmac.arc because it already comes with a pattern-matching DSL with user-definable match expanders. One of Racket's built-in match syntaxes is `quasiquote`.

-----

3 points by i4cu 2610 days ago | link

> EDIT: I think tables might be better overall and that the order-necessitating functionality may not be worth the trouble of alists.

I would agree. At least your examples seem to point to that.

I would suggest you flatten your data and add indexes:

  (= data (obj 1 (obj cont "get eggs")
               2 (obj cont "fix toilet")
               3 (obj cont "Build something that works in arc")  
               4 (obj cont "Research Ordered Associative Arrays")))

Then:

  (= todo  '(1 2))
  (= today '(3))
  (= done  '(4))

And use the indexes to lookup the records. You'll notice by doing it this way you're able to control the order and don't need alists to do so.

Trying to go down the road of deeply nested tables or alists will only lead you to pain and suffering (at least in arc).

Edit: wow, I think arc needs 'sets' too :)

-----

1 point by kinnard 2610 days ago | link

The main issue with alists is that the special syntax doesn't work and the notation is so verbose . . . I don't know if the efficiency issues would even come to bear for me.

EDIT: nesting is not behaving as I expect but that may be a product of my own misunderstanding.

-----

2 points by i4cu 2610 days ago | link

I would be careful not to structure data/logic to accommodate a special syntax.

i.e while using:

  pipe!todo.1!id

is certainly fancy, writing a function is just as effective and most likely more performant since it doesn't require read de-structuring:

(fetch pipe todo first id)

So I'm suggesting you shape your syntax usage around your data, not your data around your syntax. You can always write a macro to obtain your desired level of brevity.

-----

2 points by kinnard 2610 days ago | link

Shaping syntax around data rather than data around syntax sounds like the move, I'm probably just not used to having that option.

-----

3 points by shawn 2610 days ago | link

Thanks for pointing this out. I’ve pushed a fix. Can you confirm?

-----

2 points by kinnard 2610 days ago | link

Nice. I'm getting this error:

  $ arc
  ac.rkt:347:43: tablist: unbound identifier
  in: tablist
  location...:
   ac.rkt:347:43
  context...:
   raise-unbound-syntax-error
   for-loop
   [repeats 2 more times]
   finish-bodys
   for-loop
   finish-bodys
   lambda-clause-expander
   loop
   [repeats 66 more times]
   module-begin-k
   expand-module16
   expand-capturing-lifts
   expand-single
   temp74_0
   compile16
   temp68_2
   ...

-----

3 points by shawn 2610 days ago | link

Hmm. I know why. My mistake.

Un momento; fix incoming.

The general idea behind the fix is that quoted literals need to be treated as data. Arc now has two new functions for this purpose: quoted and unquoted.

The fact that (quote {a 1}) now becomes a hash table is a little strange. I’m not entirely sure that’s correct behavior. It depends whether (car '({a 1})) should yield a hash table. It seems like it should, which is reified in the code now.

EDIT: Ok, I've force-pushed the fixed commit. (Sorry for the force-push.)

If you `git reset --hard HEAD~1 && git pull` it should work now.

-----

3 points by kinnard 2610 days ago | link

Works great! The only step further that comes to mind to me is:

  arc> '(pipe "water")
  '(pipe "water")

  arc> "she"
  "she
  
  arc> 23
  23

  arc> {pipe "water"}
  {pipe "water"}

rather than

  arc> {pipe "water"}
  '#hash((pipe . "water"))

-----

3 points by shawn 2610 days ago | link

I tried improving anarki's repl experience at https://github.com/arclanguage/anarki/pull/145

It sort of went well, but mostly not. :)

Personally, I found I prefer racket's pretty-printing with the horrible hash tables compared to something like {pipe "water" a 1 b 2 c ...} because if you try to evaluate `items` or `profs` you won't have a clue what the data is without pretty-printing.

And it turns out I suck at writing pretty-printers. Someone else do it!

-----

3 points by shawn 2610 days ago | link

Was already working on that. It's clear it's time.

Few minutes. Maybe an hour.

-----

3 points by shawn 2610 days ago | link

Close: https://gist.github.com/shawwn/03b936d37e4cd83ca6652bb03c527...

not bad for precisely 59 minutes.

Brb, transferring to starbucks.

-----

3 points by akkartik 2618 days ago | link

Just to recap my opinion that I gave you over chat: the advantage alists have is that they're simple to support. Along pretty much any other axis, particularly if they grew too long, you'd be better off using some other data structure. Which one would depend on the situation.

-----