"My pet peeve: Lisp's multiple flavors of equality are still with us 50 years later, leading later languages like javascript astray. The superior approach IMO is to use structural equality by default, and offload the semantics of eq to an operator that returns a stable address for a value. In my toy language I call this operator addr. So I'd replace (eq a b) in Common Lisp with something like (equal (addr a) (addr b))."
Nulan, of course, uses an "egal" I wrote for Racket:
; Generic equality predicate, based on egal
; http://home.pipeline.com/~hbaker1/ObjectIdentity.html
(define (is? x y)
(if (number? x)
(if (number? y)
(= x y)
#f)
(if (immutable? x)
; TODO: do I need to check if y is immutable too?
(equal? x y)
; need to use eqv? for characters
(eqv? x y))))
In Nulan it's called "is".
The reason multiple equalities came about is because cons cells were mutable, but were treated like as if they were immutable: thus, most of the time you want to use equal on them, but occasionally you need to use eq.
If you instead have immutable conses and mutable conses, "egal" can correctly distinguish between them. Thus the problem is that we have a datastructure (cons) that attempts to both be mutable and immutable at the same time.
The solution is to properly separate mutability from immutability.
---
"Most programmers don't actually use eq that often. Python and ruby and perl still don't have it. No vaguely insulting muttering has resulted. (Is it really a big enough inconvenience to cause even muttered insult?) Most of the time eq is just an optimization, usually a premature optimization, and even otherwise one that can mostly be delegated to the implementation."
Actually, that's incorrect. Most languages have only eq and no equal: the "==" operator in Ruby/Python compares by object reference.
The reason they don't complain about it is because they never use "equal" because in those languages, every object is mutable.
But in Lisps, we treat cons cells as immutable, hence the issues with eq vs equal. So the reason they don't complain in Ruby/Python is simply because their objects are always mutable, thus eq is always the correct operator.
irb(main):001:0> class A
irb(main):002:1> end
=> nil
irb(main):003:0> a = A.new
=> #<A:0x8f6dd3c>
irb(main):004:0> a == a
=> true
irb(main):005:0> a == a.clone
=> false
irb(main):006:0> x = "abc"
=> "abc"
irb(main):007:0> x == x
=> true
irb(main):008:0> x == x.clone
=> true
irb(main):009:0> (1..10) == 10
=> false
irb(main):010:0> (1..10) === 10
=> true
- Perl (near as I can tell) doesn't have anything in the way of user-definable comparison, but still has separate operators. Granted, I don't think the separation is related to the reference-vs-value thing---I don't do Perl much, so the semantics confuse me:
It'll be hard to appeal to the authority/popularity of most languages, because equality is a tricky concept with many meanings---whether or not it's specifically the address-vs-value (eq-vs-equal) thing.
I like egal fine if we decide to tell the type system about immutability, but that seems like a major design decision. I'm not sure this one feature is sufficient reason.
I don't understand the paper very well (thanks for the pointer). I'm not sure why he cares about identity so much if most of the time we're treating lists as immutable. And if a list is mutable equality is not forever, all it returns is whether these two lists are equal right now.
The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time.
I'm going to go reread it, but so far I think it suffers from being written 10 moore's law generations ago, and complects performance considerations with API design.
"Most languages have only eq and no equal: the == operator in Ruby/Python compares by object reference."
$ python
>>> [0, 1, 2] == [0, 1, 2]
True
I'm pretty sure those two literals turn into objects with different 'identities'.
Most of the time I don't care what 'identity' an object has. If I'm doing weird stuff with assignment and mutable structures I'm happy for the language to push back and make me go back to the straight and narrow asap.
"The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time."
Just as a point of interest, Racket handles this with its built-in types:
I wasn't able to post this earlier, due to the Arc Forum being all wonky, so I'll post it now.
---
"I'm pretty sure those two literals turn into objects with different 'identities'."
Weird, I distinctly remember Python returning false for that... I guess I was wrong about == in Python/Ruby.
Oh, I see, I was thinking about the "is" operator, which does return False, but is rarely used. So you were right: Python uses equal by default for arrays/dictionaries. However, that only applies to the built-in types:
>>> class Foo:
... pass
>>> Foo() == Foo()
False
User-created types can add an __eq__ method to implement "equal" testing, but it's not the default.
JavaScript, however, uses object reference for both "==" and "===":
[1, 2, 3] == [1, 2, 3]
[1, 2, 3] === [1, 2, 3]
Which, as far as I'm concerned, is the correct semantic, because arrays are mutable.
"I think I don't understand why object reference is worth a special operator (besides performance reasons)."
The point of "egal" is that you don't have a special operator: you use the same operator for both mutable and immutable objects. It's wart (and Arc/Common Lisp/Scheme) that has a special operator for reference purposes. Just because you call it "addr" rather than "eq" doesn't make it any less of a special operator. And I doubt you put it into wart for performance reasons.
The "egal" article actually defines equality as "operational equivalence":
Two objects are "operationally equivalent" if and only if there is no way
that they can be distinguished, using ... primitives other than [equality
primitives]. It is guaranteed that objects maintain their operational
identity despite being named by variables or fetched from or stored into
data structures. [Rees86,6.2]
In other words, it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object. But this is not possible with immutable objects.
So my stance is simple: immutable objects should be compared by recursing on their components, like how "iso" works in Arc. Mutable objects should compare by reference equality, because the whole point of mutation is that you can change the object without affecting other objects which have the same subcomponents. That has to do with the conceptual model of how the objects are used and has nothing to do with performance.
The reason the article talks about performance at all is because immutability is much preferred in parellel computing, and most Lisps treat cons cells as immutable, but they're not actually immutable, so the parallel system has to have extra infrastructure to support mutable conses, even though they're almost never mutated. In any case, the point the article is making is valid even if you ignore performance:
There are a several problems with EQUAL. First, it may diverge in the
presence of directed cycles (loops) in one of its arguments, although some
(e.g., [Pacini78]) have suggested more sophisticated predicates capable of
detecting such cycles. Secondly, it is not referentially transparent; two
calls to EQUAL on the "same" (i.e., EQ) arguments can produce different
results at different times. Neither of these possibilities is to be desired
in a programming language primitive equality test because we would like such
a test to always return and we would like object identity to be preserved.
Yet EQUAL is an extremely valuable operation, because the vast majority of
Lisp lists are side-effect free--i.e., "pure". Without side-effects, loops
cannot be constructed and sublists cannot be modified, and within these
restrictions EQUAL becomes a well-defined and useful operation.[8]
This tension between the simplicity of EQ and the usefulness of EQUAL has
led to a great deal of confusion. This confusion has now lasted for 30
years, with additional confusion having been recently added by Common Lisp.
Since neither EQ nor EQUAL has the desired semantics for the multiplicity of
Common Lisp datatypes, Common Lisp added six more--EQL, EQUALP, =, CHAR=,
STRING=, and STRING-EQUAL, yet these also lack the correct semantics.[9]
Scheme continues this confusion by offering the three predicates (eq?, eqv?
and equal?) which are roughly equivalent to Common Lisp's EQ, EQL and
EQUALP.
In other words, if you have mutable cons cells that are almost always treated as immutable, you need both eq and equal. But if you create two datatypes: mutable cons and immutable cons, then you only need "egal". With immutable conses, you don't want to use "eq" on them because they're functional: letting you grab the reference of an immutable cons would be an abstraction leak.
But with mutable conses, you do need "eq", because two cons cells that are equal may not be eq. So, if your language distinguishes between mutable and immutable conses, you only need "egal". This not only prevents abstraction leaks, but it also preserves operational equivalence, which means that if "egal" says that two objects are equal, then it's impossible to distinguish them in any way. This is not true with eq/equal.
As a practical example of why you would want that... look at Racket. It has three kinds of hash tables: hash-eq, hash-equal, and hash-eqv. And it has all kinds of issues if you use a mutable object as a key in an "equal" hash:
Caveat concerning mutable keys: If a key in an equal?-based hash table is
mutated (e.g., a key string is modified with string-set!), then the hash
table’s behavior for insertion and lookup operations becomes unpredictable.
So you, the programmer, have to decide every time you create a hash table which kind of equality it should use. And if you choose wrong, you end up with issues. And it also means you can't mix mutable and immutable keys in a hash table. This is ridiculous. With "egal" this isn't a problem.
Notice that none of this has to do with performance. It has to do with the conceptual model of how objects behave.
Thanks for all that detail! Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy.
I think the crux is this: "..it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object."
This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality.
However I have a lot more respect after your comment for how egal can simplify lots of thorny concurrency issues.
This is a matter where Pauan and I agree on the ideal semantics completely. I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal").
That said, I don't care if this functionality is provided as a two-argument function or as (addr ...).
---
"Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy."
I built Cairntaker on the idealistic notion that every single first-class value is a mutable or immutable weak table. That includes the table keys. Cairntaker does make compromises for performance and FFI, but none that betray this ideal.
Even when I'm programming normally in Arc or JavaScript, I use tables with immutable list keys all the time. It doesn't seem unusual to me.
"This is a matter where Pauan and I agree on the ideal semantics completely."
Quite the rare thing indeed.
---
"I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal")."
I personally haven't. I only recently started to actually care about egal because of Nulan using immutable objects.
I do remember us having discussions before about making "iso" the default equality in Arc, which is what wart does, but I don't recall us talking about cons mutability.
However, I can't find any more than that. Maybe I just took it for granted at that point.
It is the policy I applied in Cairntaker. Cairntaker interns all immutable tables so they can be compared for deep equality even after some of their entries have been garbage-collected. Equality support is essential in Cairntaker since any value can be used as a table key.
"Even when I'm programming normally in Arc or JavaScript, I use tables with immutable list keys all the time. It doesn't seem unusual to me."
You silly. Neither JavaScript nor Arc have immutable lists. Well, I guess you could use Object.freeze on an array in JS, but... even if you did that, it would just serialize the array to a string.
Lol, yep. The way I "implement" immutable lists in practical programming is by not modifying them. I rationalize that their static type would enforce this if only there were a type system. :-p
More specifically...
In Arc I can't really rely on using lists as keys, since (IIRC) Jarc converts table keys to strings using the same behavior as 'write. However, in practice I have used lists of alphanumeric symbols, since those have a predictable 'write representation on Jarc, a predictable Racket 'equal? behavior on pg-Arc, etc.
Similarly, in JavaScript I tend to use Arrays of strings, storing them as their ECMAScript 5 JSON representation.
Yeah I take that back about immutable list keys :)
This is all most interesting. I'm going to keep an eye out for nice programs that exploit structural equality on mutable structures. Y'all should do the same!
"This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality."
Then make the objects immutable. Then equal makes sense. Then the compiler/interpreter knows what your intent is. Then you avoid all these issues. Then you don't even need a distinction between eq/equal: you can just use egal.
This seems to me to be the obvious choice, but it seems we disagree.
Yeah, I didn't fully understand egal when I said I agreed with it if the language distinguishes immutable objects. I want my equality to be more lenient in comparing mutable objects. Even with mutable objects it seems a lot more common that I just care about structural equality rather than identity.
Basically. You could drop down and use Racket's "fixnum?" and "integer?" functions, but Nulan doesn't provide them. If you find any way to distinguish them, I'd like to hear it.
I prefer to view integers as being a subset of floating point, so I prefer to see 1 and 1.0 as identical. Is there a reason not to?
Does Nulan give access to 'write? That would distinguish them.
"I prefer to view integers as being a subset of floating point[...]"
I don't know about that. What happens with very large, but exact, integers?
---
"[...]so I prefer to see 1 and 1.0 as identical. Is there a reason not to?"
Actually I think there is a reason. We're talking about exact and inexact numbers here. If you want to know that a number is exactly 1, then 1.0 doesn't give you enough information. :-p
Personally, I think floating point numbers mostly get in the way. I'd rather treat them as a separate concept, even if it means using a special +. operator like OCaml.
I don't see much need for complex numbers, rational numbers, or negative numbers either, but at least their addition and multiplication are still associative.
"Does Nulan give access to 'write? That would distinguish them."
Not if I change printing to always use floating point, or print exact floating points as integers. :P I actually did something like that with Arc/Nu: I was tired of typing in (/ 1 5) and getting back 1/5, so I changed Arc/Nu to print exact numbers as if they were inexact, so it would print 0.2. This only affects the printing, not the actual math stuff.
---
"I don't know about that. What happens with very large, but exact, integers?"
I dunno, what happens in Racket?
---
"Similarly, in JavaScript I tend to use Arrays of strings, storing them as their ECMAScript 5 JSON representation."
Actually, this is going a bit off topic, since it doesn't especially matter to me whether exact integers are a subset of inexact numbers. I thought I even deleted this part of my reply, but oh well. XD
Looks like the racket docs say[1] Racket's inexact numbers are IEEE floating-point numbers of single or double precision, with double being the default. So they don't preserve integers beyond 53 bits:
Odd, this suggests Racket double-precision floating-point numbers have only 51 bits of mantissa rather than 52, so maybe they actually only preserve integers up to 52 bits...?
I "solved" this issue by simply making all numbers 64-bit floating point. The change was really easy to make. I didn't do it because of this issue, though. I did it because I want to eventually compile to JS, and JS only has 64-bit floating point.
Though, it also makes it easier conceptually to only have to worry about floating point, as compared to floating point + integer + rational + complex + whatever else Racket has. Even if you divide numbers into exact/inexact like Racket does, that's still more complicated than only having a single numeric type.
If you don't identify as a programmer, if that isn't your core strength, if you just program now and then because it's expedient, then treating libraries as services may make more sense. If a major issue pops up you'll need to find more expert help, but you knew that already.
I did. :)
Even so, I think it's also worth considering that libraries also represent learning opportunities for some (like myself). For whatever reason, perhaps because it isn't my core strength, the only approach to programming that I have found to work for me is to continually try to do something. I seem only to be able to truly absorb the finer points when they are encountered in the context of a whole.
For a long time this kept me from programming altogether.
Yeah, sorry the names have been so useless in penetrating my meaning. I'm planning to write a followup without ever using the words 'abstraction', 'library' or 'service'. Libraries don't have to suck, if you know about how they work. My definition of abstraction was that you had to know something about how it was implemented. It's just that in the real world we don't know how 99% of our libraries work, and our entire eco-system encourages that, and library writers don't care to make their implementations easy to understand because, "Who does that, go read about how it works? Just a couple of guys who'd figure it out anyway, even if I gave them no documentation."
To answer your question, the first example that pops to mind is file uploads in arc. Let's consider a parallel universe where arc came with multi-part file upload support, and it worked for you out of the box, and you never came here with questions, and I never delved into how it worked. In that scenario the file-upload features in arc would be a service to me, not an abstraction. But in this universe all those things happened[1], and now file-upload is an abstraction for me. I know I can get into its guts if I find something wrong. I've become a lion when it comes to file upload (http://paulgraham.com/boss.html).
Notice that this would be true even if the code I wrote for it were identical to the code the authors of arc would have written (ha!). This isn't a technical argument but a social one: Programmers should learn about their dependencies. Or put another way: Ask not if something is an abstraction. Ask if it is an abstraction to you.