Arc Forum | "I can't believe this never came up working with json"I think order-preserving t...

Arc Forum

3 points by rocketnia 2352 days ago | link | parent

"I can't believe this never came up working with json"

I think order-preserving tables are pretty useful for JSON -- particularly to help ensure implementation details aren't leaking even in a distributed system where JSON is being passed through multiple parsers and serializers -- but it's worth noting that even JSON objects are specified to be unordered and are commonly treated as such. (RFC 8259: "An object is an unordered collection ...")

Even JavaScript's standard JSON.parse(...) doesn't preserve order the way you might like it to. Here's what I get in Firefox right now:

  > JSON.stringify(JSON.parse("{\"a\": 10, \"1\": 20}"))
  "{\"1\":20,\"a\":10}"

This is because JavaScript objects historically didn't preserve any particular iteration order, and although they have more consistent cross-browser behavior these days, they seem to have settled on some rules like iterating through numeric properties like "1" before other properties like "a". (From what I've found, I see people saying this behavior is a standard as of ES2015, but I don't see it in the ES2018 spec.)

JavaScript Maps are a different story; they're a newer addition to the language, and they're strictly specified to preserve insertion order. Other languages use dictionaries which preserve order, including Groovy (where [a: 1, b: 2] is a LinkedHashMap literal), Ruby, and... you've already found details on Python.

Even in Groovy, JavaScript (Maps), and Ruby, the only way to observe the order of their collections is to iterate from the beginning, and the only way to influence it is to insert entries at the end. This means any substantial interactions with the order of these collections will be just about as inefficient and inconvenient as if we had converted the whole collection to an association list, processed it that way, and converted it back. Essentially, it seems the order is preserved only to prevent programs from having language-implementation-dependent bugs, and maybe to help with recognizing values in debug output, not because it's a useful for programming.

Every one of these languages makes it easy to communicate the intent of an ordered collection by using lists or arrays of some sort. I'm pretty sure I've seen this lead to association lists even in JSON, particularly for things like database query results:

  [
    {"id":..., "name": ..., "occupation": ...},
    {"id":..., "name": ..., "occupation": ...}
  ]

When the order matters, I think this is the kind of approach to prefer, even if it takes a few more abstractions to make key lookup convenient.

1 point by kinnard 2352 days ago | link

What do you think of this: http://arclanguage.org/item?id=21066

-----

2 points by rocketnia 2351 days ago | link

It could be me, but I have trouble imagining any situation where I would need this syntax.

- If I'm not doing both kinds of lookup several times in the same part of the code, the lookup styles don't both need to be concise at the same time. Different parts of the code can convert to different representations that are concise to use in that context.

- If it's the root layer of a data structure, then I can simply use two different local variables to refer to that data, one of which always treats it as a list and one of which always treats it as a table.

- Unless I'm processing some specific indices, I'll usually want to iterate through the whole data structure, so I usually wouldn't be using any indexing operations in the first place.

- If I know what specific indexes I want at development time, then I would usually use an unordered table (for easy indexing) or a fixed-length list (for easy destrucuring).

- If I'm performing any two lookups for the same reason, they can usually use the same expression in the code.

So I would have to be writing some code where I'm performing several different lookups of specific ordered indexes and specific chosen keyed indexes, all of which are for distinct reasons I know at development time but few of which are under indexes I know at development time. Moreover, each lookup must be two or more layers deep into a nested data structure.

This seems to me like a very specific situation. I suppose maybe it might come up more often in data-driven applications that have many scripted extensions which are specialized to certain configurations of data, but it seems to me even those would rarely care about specific ordered indexes.

Even what you've described about your application makes it sound like you only need to look things up by ordered indexes when the user wants to move an item from one stack to the next (or to the first). If that one part of your code is verbose, I recommend not worrying about it. If the rest of your code is verbose, how about converting your alists to tables whenever you enter that code?

-----

3 points by rocketnia 2351 days ago | link

I'm sorry, you have a very clear idea in mind of the data structure you want, and it's very reasonable to want to know how to build it in a way that gives you convenient syntax for it.

I think I'm just trying to encourage you to get to know other techniques because I think that's the easiest way to start. Building a custom data structure in Arc is not something many people go to the trouble to do, so it's a bit hard to come up with a good example.

I don't think you need any new ssyntaxes for this new data structure.

Say you have an ordered dictionary value `db` and you want convenient syntaxes to look up items by index (where 0 should give you the first item) and by key (where 0 should give you the item with key 0). One thing you might be able to do is this:

  db!by-key.0
  db!by-pos.0

To get these lookups working, you only need to use Anarki's `defcall`. Here's a stub that might get you started:

  (defcall ordered-dict (self val)
    (case val
      by-key (fn (key) ...)
      by-pos (fn (pos) ...)
      (err "Expected by-key or by-pos")))

Unfortunately, this doesn't allow you to assign:

  (= db!by-key.0 "val")
  (= db!by-pos.0 "val")

To get these to work, you must know a little bit about how Arc implements assigment. The above two calls are essentially turn into this:

  (sref db!by-key "val" 0)
  (sref db!by-pos "val" 0)

This means you need to extend `sref` using Anarki's `defextend` to make these work.

But we've defined db!by-key and db!by-pos to be procedures that only know how to get the value, not how to set it. We need them to return something that knows how to do both.

So let's define another type, 'getset, which contains a getter function and a setter function.

  (def make-getset (get set)
    (annotate 'getset (list get set)))
  
  (def getset-get (gs . args)
    (apply rep.gs.0 args))
  
  (def getset-set (gs val . args)
    (apply rep.gs.1 val args))

We need it to be callable and settable:

  (defcall getset (self . args)
    (apply getset-get self args))

  (defextend sref (self val . args) (isa self 'getset)
    (apply getset-set self val args)))

Now we can refactor our (defcall ordered-dict ...) definition to put in implementations for setting:

  (defcall ordered-dict (self val)
    (case val
      
      by-key
      (make-getset
        (fn (key) ...)
        (fn (key val) ...))
      
      by-pos
      (make-getset
        (fn (pos) ...)
        (fn (pos val) ...))
      
      (err "Expected by-key or by-pos")))

You should pick some representation to use (annotate 'ordered-dict ...) with (or whatever other name you like for this type) so you can implement all four of these methods.

Note that we don't need to do (defextend sref ...) for ordered-dict values here because the `sref` calls don't ever get passed the table directly. You would only need that if you wanted (= db!by-key "val") or (= db.0 "val") to do something, but those look like buggy code to me.

So far, so good, but you probably want to do more with ordered dictionaries than getting and setting.

There's at least one more utility you might like to `defextend` in Anarki for use with ordered dictionaries: `walk`. A few Anarki utilities like `each` internally call `walk` to iterate over collections, so if you extend `walk`, you can get several Anarki utilities to work with ordered dictionaries all at once.

You may also want to make various other utilities for coercing between ordered dictionatires and other Anarki values, such as tables and alists. That way you can easily use Anarki's existing table and alist utilities when they work for your situation.

Note that this approach doesn't make it possible to use the {...} curly brace syntax you and shawn have been talking about. Since there are potentially many kinds of tables, I'm not sure whether giving them all read syntaxes really makes sense; we might start getting into some obscure punctuation. :-p

I hope that's a little bit more helpful advice for your situation. I tried to write up something like this the other day and got carried away implementing it all myself, but then I realized it might not be what you really wanted to use here. It didn't even take an approach as convenient to use as this one. Maybe this can give you enough of a starting point to reach a point you're happy with.

-----

3 points by i4cu 2351 days ago | link

What about using '#' for by-pos and '?' for by-key via the reader?

  db#0
  db?0

-----