Arc Forumnew | comments | leaders | submitlogin
Tagged unions on Anarki
5 points by absz 6002 days ago | 18 comments
While working through Essentials of Programming Languages (a wonderful textbook on programming languages and implementing interpreters by Daniel Friedman, Mitchell Wand, and Christopher Haynes), I saw tagged unions/discriminated unions/variant records/algebraic data types[1] for the first time and decided that they were excellent. (See below for an explanation of what they are.) I've since dabbled with Haskell and seen them yet more there (they're Haskell's data Type = Var1 A B | Var2 C). At any rate, I then decided to implement them in Arc. The implementation was pushed to Anarki as lib/tagged-union.arc; it requires Anarki's Arc-side ssyntax extension.

A tagged union, for those of you not familiar with them, is a data type which has different "variants," each of which is constructed by a different function. For instance, here's a type you use every day: the (proper) list, with constructor functions cons and (fn () nil). The parameters of the constructor functions are typed (or, to be precise, validated by given predicates). Using my library, this would be written (using different names)

  (tunion list
    lcons (car object)
          (cdr list?)
    lnil)
so that (lcons 'a (lnil)) is the analog of (cons 'a nil) or '(a). Note the syntax here:

  (tunion TYPE-NAME
    VARIANT-1 (ELEMENT-1-1 PREDICATE-1-1)
              (ELEMENT-1-2 PREDICATE-1-2)
              ...
              (ELEMENT-1-M PREDICATE-1-M)
    ...
    VARIANT-N (ELEMENT-2-1 PREDICATE-1-1)
              (ELEMENT-2-2 PREDICATE-1-2)
              ...
              (ELEMENT-2-P PREDICATE-1-P))
TYPE-NAME is the type you are declaring. You can declare any number of variants, and they can have any number (including 0) of fields/members/arguments. When constructing a variant, each element must satisfy the given predicate; thus, (lcons 'a 'b) would return nil, since 'b doesn't satisfy the predicate list?[2] For some more examples, see the comment at the top of lib/tagged-union.arc.

When tunion is called, it populates the global namespace with the functions VARIANT-1...N, and TYPE-NAME-VARIANT-1...N, which are the constructors; it redefines string so that the objects can be stringified (in a readable but ugly manner); and it redefines sref so that you can write things like (= (my-list 'car) 3). defcall is used so that you can also write things like (= x (my-list 'car)). Also note that all tagged union objects satisfy (isa tu 'TYPE-NAME) and (isa (rep tu) 'VARIANT-NAME).

The other macro provided by lib/tagged-union.arc is tcase, which does matching based on tagged union types. An example:

  (tcase my-list list
    lcons (do (prn "car: " (string car)) (prn "cdr: " (string cdr)))
    lnil  (prn "nil")
          (w/stdout (stderr) (prn "Not a list!")))
If my-list is not list (or a variant of list that was not covered, though that doesn't apply here), then the last statement will run. Otherwise, if my-list was created by lnil, then the corresponding statement will run, and if it was created by lcons, that statement will run; the members of that variant are bound to their names in that statement, so (prn ... car) won't print "#<procedure:.../arc-wiki/ac.scm:700:11>", but will instead print what you want. The other form of syntax for tcase is

  (tcase var
    (list
      lnil (prn "nil")
           (w/stdout (stderr) (prn "Non-empty list.")))
    (maybe ; Another tagged union.
      just    (prn value)
      nothing (prn "Nothing at all."))
    else
      (w/stdout (stderr) (prn "Unrecognized type!")))
Here, if var is a list, then things run as though you had written

  (tcase var list
    lnil (prn "nil")
         (w/stdout (stderr) (prn "Non-empty list.")))
, and similarly if var is a maybe. If var is neither of these things, then the statement after the else will be run.

Sorry that this was so long... at any rate, I hope that someone finds this useful!

[1]: As far as I can tell, these are all different names for the same thing; if this isn't right, please correct me!

[2]: Predicates of the form type? are from the Arc-side ssyntax package, and mean [isa _ 'type]. You can also write that directly, or you can write 'type (which tunion will convert into [isa _ 'type]).



2 points by rkts 6002 days ago | link

I definitely think this is a good idea. We've discussed this before, of course:

http://arclanguage.org/item?id=3471

The major difference between my version and yours is that you require each constructor to belong to one "union" type, whereas in my version each constructor forms its own type. The latter approach seems more Arc-like to me. It's simpler, and in a dynamic language I don't think you should be forced to stuff constructors into fixed categories. (I'm not saying that such categories are useless, just that variant tags shouldn't be inextricably tied to them.)

You may want to look at OCaml's polymorphic variants, which are closer to what I had in mind for Arc:

http://caml.inria.fr/pub/docs/manual-ocaml/manual006.html#ht...

The auto-binding of names to fields seems cool, but I'd need to practice it to see if it was a good strategy overall. You've already managed to shadow car and cdr, which could cause a lot of code to fail mysteriously (think macroexpansions, e.g. each). Also, if the user binds the names explicitly, you get destructuring for free.

Sorry if this is incoherent; I've been up all night.

-----

1 point by absz 6002 days ago | link

I remember that discussion; in fact, I had a response, http://arclanguage.org/item?id=3518, in which I proposed this (and I might have been working on this at the time).

The clearest advantage of distinct types is in things like binary trees:

  (tunion binary-tree
    branch (left  binary-tree?)
           (right binary-tree?)
    leaf   (value object))
If you just had

  (vtype branch left right)
  (vtype leaf   value)
you can't quickly tell what's a binary tree and what isn't in the same way, and you lose the invariant that a branch's left and right are both binary trees. At the same time, something more like vtype would remove

  (tunion my-type
    my-type (foo pred)
            (bar int?))
. Actually, if you had something like

  (mac vtype (type . arguments)
    `(tunion ,type ,type ,@arguments))
you could do this:

  (vtype variant1 (x pred))
  (vtype variant2 (a int?) (b object))
  
  (tcase var
    (variant1
      variant1 (do-something var))
    (variant2
      variant2 (do-something-else var))
    else
      (fail))
And you should be able to write a macro that will convert something nicer to that form, e.g.

  (vcase var
    variant1 (do-something var)
    variant2 (do-something-else var)
             (fail))
. I'll work on that and push it to Anarki.

I'm not convinced on the auto-binding; I implemented it that way because the resulting syntax was nice. It might make sense to have an option to work without auto-binding of names, but maybe it won't actually be an issue in practice. car and cdr are a bit atypical because (I hope) you won't be redefining a list type :)

-----

1 point by absz 6001 days ago | link

See http://arclanguage.org/item?id=7395: is this like what you were talking about?

-----

1 point by rkts 5999 days ago | link

Yep. I wouldn't call it elegant, but it works.

I feel like there ought to be a better way to do this--something that's less complex but handles more cases. In particular I'd like to be able to alternate OO and pattern-matching styles on the same datatypes. However, I haven't been able to come up with a solution that really satisfies me.

One other thing: wouldn't it be better to raise an error when an invariant is violated, instead of returning nil?

-----

1 point by absz 5999 days ago | link

You can use OO structure-access semantics with tagged unions: (tu 'slot-name) and (= (tu 'slot-name) new-value). That's not the interesting part of OO, of course :)

I thought about returning an error, but Arc seems to be a little calmer about such things: (is (car nil) (cdr nil) nil), for instance. Of course, (car 'sym) errors. ::shrug:: I'm not sure about this, and it might change.

-----

3 points by rkts 5999 days ago | link

The purpose of a type system is to help find errors by failing early instead of letting bugs propagate through a system and show up in mysterious ways. Right now, you're actually making debugging harder because you're just wrapping bugs inside other bugs. Instead of figuring out why his binary tree turned into a cheeseburger, a programmer now has to determine (a) why his binary tree is nil (or why mutating it seems to have no effect), (b) which invariant he violated, and (c) what error caused the invariant to be violated.

Regarding (car nil) and (cdr nil), these return nil because it's sometimes convenient. For instance, here's a function that collects every other item in a list:

  (def foo (xs)
    (if xs
      (cons car.xs (foo cddr.xs))))
If (cdr nil) raised an error, we would have to check the cdr before we could recur:

  (def foo (xs)
    (if xs
      (cons car.xs (if cdr.xs (foo cddr.xs)))))
Not a huge difference, obviously, but a difference nevertheless. I can't see any comparable reason to return nil in your case.

-----

1 point by absz 5999 days ago | link

Fair enough, you've convinced me.

EDIT: Creation, reading (defcall), and writing (sref) all give errors if invariants are violated/nonexistent keys are accessed. Is there anything I missed?

-----

1 point by absz 6001 days ago | link

Update: I (think I)[1] added rkts's suggestion of "solitary" tagged unions/variants, which act very similarly to plain structures except with the availability of a vcase macro (very much like tcase). They are defined with

  (vtype name (arg1 pred1) arg2 ... (argN predN))
, where an argument without a predicate is implicitly (arg object) (you can't do this in tcase without extra parentheses, though). vcase works similarly to tcase:

  (vcase var
    name1 (expr1)
    name2 (expr2)
    ...
    nameN (exprN)
          (optional-error-condition))
Note that unlike in tcase, the names aren't connected, but are merely defined by whichever vcase (or a tcase with a variant named the same thing as the union, which is what a vcase type is).

This required fixing a bug where you couldn't have a variant with the same name as its tagged union (because you then had (annotate NAME (annotate NAME ...)), and that is equivalent to (annotate NAME ...)). This means that you can do that now, but that *(~isa (rep tu) 'variant-type)&.

[1]: That is, I added something which I think is what rkts suggested; rkts will probably disabuse me of this notion rather quickly if I got something wrong :)

-----

1 point by almkglor 6001 days ago | link

IIRC (annotate 'foo (annotate 'foo obj)) is different from (annotate 'foo obj); each annotate adds another layer of (rep ...) to peel off.

http://arclanguage.com/item?id=3692

-----

2 points by absz 6001 days ago | link

Yes, but. That only holds if in (annotate t1 (annotate t2 data)), we have (isnt t1 t2) (there was a thread about this somewhere, but I can't find it). Try it!

  arc> (annotate 'foo (annotate 'bar obj))
  #3(tagged foo #3(tagged bar #3(tagged mac #<procedure>)))
  arc> (annotate 'foo obj)
  #3(tagged foo #3(tagged mac #<procedure>))
  arc> (annotate 'foo (annotate 'foo obj))
  #3(tagged foo #3(tagged mac #<procedure>))
I was surprised too; that's why this was a bug I had to fix. I suppose this behaviour makes sense, but it does break the second of the two identities

  (is (type (annotate t r)) t)
  (is (rep  (annotate t r)) r)
for certain values of r. Ah well.

-----

1 point by almkglor 6001 days ago | link

Huh. Have you tried it on PG's ArcN? It might have been me hacking this onto Anarki (I have multiple personalities. No, don't believe what I just said, that wasn't me. LOL). Don't have access to an arc right now, sorry ^_^

-----

2 points by almkglor 6001 days ago | link

Ah, it seems you're right ^^ From canonical arc:

  (define (ar-tag type rep)
    (cond ((eqv? (ar-type rep) type) rep)
          (#t (vector 'tagged type rep))))

  (xdef 'annotate ar-tag)

-----

1 point by almkglor 6002 days ago | link

Looks good, although probably needs more examples, maybe some actual use cases.

On the implementation side... I notice you use (apply list 'do ...). I suspect you can probably make it a very big `(do ...) form, although you may have felt that it was getting bit complicated I guess ^^;.

-----

1 point by absz 6002 days ago | link

For use cases: see Haskell code? :P I wrote this because I really liked them in other languages and in EOPL, and thought they might come in handy. If I come across somewhere in my Arc code where I use them (I haven't written too much new Arc recently) I'll report out.

I also wrote most of this a while ago (I think I started in March), and my coding has improved quite a bit since then, so the implementation could probably get cleaned up... :)

-----

2 points by almkglor 6001 days ago | link

> For use cases: see Haskell code? :P

LOL. The 'maybe type looks awfully familiar... ^^

> If I come across somewhere in my Arc code where I use them (I haven't written too much new Arc recently) I'll report out.

Please do ^^. As an aside, it may be possible to transform the AST data structures in arc2c to tagged unions; we could transform the (if (alam ast) ... (alit ast) ...) to tcase forms. Care to try? It strikes me as a possible demonstration of your code, and may be useful to shake out bugs.

-----

1 point by absz 6001 days ago | link

I like maybe :) Also, it's simple and good for testing (e.g. it has a zero-ary type).

The arc2c thing sounds sensible---I have to fix a bug in tagged-union.arc first (you can't currently have a constructor with the same name as the datatype), but then I'll try to work on it.

-----

2 points by almkglor 6001 days ago | link

Well, in this case the data type would be ast, and the constructors would be lam, lit, quote, etc.

-----

2 points by absz 6001 days ago | link

Right---there was something similar in EOPL, and I have something similar in a toy Lisp interpreter (in Haskell) I'm working on.

-----