Nulan, of course, uses an "egal" I wrote for Racket:
; Generic equality predicate, based on egal
(define (is? x y)
(if (number? x)
(if (number? y)
(= x y)
(if (immutable? x)
; TODO: do I need to check if y is immutable too?
(equal? x y)
; need to use eqv? for characters
(eqv? x y))))
In Nulan it's called "is".
The reason multiple equalities came about is because cons cells were mutable, but were treated like as if they were immutable: thus, most of the time you want to use equal on them, but occasionally you need to use eq.
If you instead have immutable conses and mutable conses, "egal" can correctly distinguish between them. Thus the problem is that we have a datastructure (cons) that attempts to both be mutable and immutable at the same time.
The solution is to properly separate mutability from immutability.
"Most programmers don't actually use eq that often. Python and ruby and perl still don't have it. No vaguely insulting muttering has resulted. (Is it really a big enough inconvenience to cause even muttered insult?) Most of the time eq is just an optimization, usually a premature optimization, and even otherwise one that can mostly be delegated to the implementation."
Actually, that's incorrect. Most languages have only eq and no equal: the "==" operator in Ruby/Python compares by object reference.
The reason they don't complain about it is because they never use "equal" because in those languages, every object is mutable.
But in Lisps, we treat cons cells as immutable, hence the issues with eq vs equal. So the reason they don't complain in Ruby/Python is simply because their objects are always mutable, thus eq is always the correct operator.
irb(main):001:0> class A
irb(main):003:0> a = A.new
irb(main):004:0> a == a
irb(main):005:0> a == a.clone
irb(main):006:0> x = "abc"
irb(main):007:0> x == x
irb(main):008:0> x == x.clone
irb(main):009:0> (1..10) == 10
irb(main):010:0> (1..10) === 10
- Perl (near as I can tell) doesn't have anything in the way of user-definable comparison, but still has separate operators. Granted, I don't think the separation is related to the reference-vs-value thing---I don't do Perl much, so the semantics confuse me:
It'll be hard to appeal to the authority/popularity of most languages, because equality is a tricky concept with many meanings---whether or not it's specifically the address-vs-value (eq-vs-equal) thing.
I like egal fine if we decide to tell the type system about immutability, but that seems like a major design decision. I'm not sure this one feature is sufficient reason.
I don't understand the paper very well (thanks for the pointer). I'm not sure why he cares about identity so much if most of the time we're treating lists as immutable. And if a list is mutable equality is not forever, all it returns is whether these two lists are equal right now.
The only new drawback I learned about was that equal can't handle cycles. I hadn't thought of that. It's not hard to fix, but the obvious fix costs performance every single time.
I'm going to go reread it, but so far I think it suffers from being written 10 moore's law generations ago, and complects performance considerations with API design.
"Most languages have only eq and no equal: the == operator in Ruby/Python compares by object reference."
>>> [0, 1, 2] == [0, 1, 2]
I'm pretty sure those two literals turn into objects with different 'identities'.
Most of the time I don't care what 'identity' an object has. If I'm doing weird stuff with assignment and mutable structures I'm happy for the language to push back and make me go back to the straight and narrow asap.
I wasn't able to post this earlier, due to the Arc Forum being all wonky, so I'll post it now.
"I'm pretty sure those two literals turn into objects with different 'identities'."
Weird, I distinctly remember Python returning false for that... I guess I was wrong about == in Python/Ruby.
Oh, I see, I was thinking about the "is" operator, which does return False, but is rarely used. So you were right: Python uses equal by default for arrays/dictionaries. However, that only applies to the built-in types:
>>> class Foo:
>>> Foo() == Foo()
User-created types can add an __eq__ method to implement "equal" testing, but it's not the default.
[1, 2, 3] == [1, 2, 3]
[1, 2, 3] === [1, 2, 3]
Which, as far as I'm concerned, is the correct semantic, because arrays are mutable.
"I think I don't understand why object reference is worth a special operator (besides performance reasons)."
The point of "egal" is that you don't have a special operator: you use the same operator for both mutable and immutable objects. It's wart (and Arc/Common Lisp/Scheme) that has a special operator for reference purposes. Just because you call it "addr" rather than "eq" doesn't make it any less of a special operator. And I doubt you put it into wart for performance reasons.
The "egal" article actually defines equality as "operational equivalence":
Two objects are "operationally equivalent" if and only if there is no way
that they can be distinguished, using ... primitives other than [equality
primitives]. It is guaranteed that objects maintain their operational
identity despite being named by variables or fetched from or stored into
data structures. [Rees86,6.2]
In other words, it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object. But this is not possible with immutable objects.
So my stance is simple: immutable objects should be compared by recursing on their components, like how "iso" works in Arc. Mutable objects should compare by reference equality, because the whole point of mutation is that you can change the object without affecting other objects which have the same subcomponents. That has to do with the conceptual model of how the objects are used and has nothing to do with performance.
The reason the article talks about performance at all is because immutability is much preferred in parellel computing, and most Lisps treat cons cells as immutable, but they're not actually immutable, so the parallel system has to have extra infrastructure to support mutable conses, even though they're almost never mutated. In any case, the point the article is making is valid even if you ignore performance:
There are a several problems with EQUAL. First, it may diverge in the
presence of directed cycles (loops) in one of its arguments, although some
(e.g., [Pacini78]) have suggested more sophisticated predicates capable of
detecting such cycles. Secondly, it is not referentially transparent; two
calls to EQUAL on the "same" (i.e., EQ) arguments can produce different
results at different times. Neither of these possibilities is to be desired
in a programming language primitive equality test because we would like such
a test to always return and we would like object identity to be preserved.
Yet EQUAL is an extremely valuable operation, because the vast majority of
Lisp lists are side-effect free--i.e., "pure". Without side-effects, loops
cannot be constructed and sublists cannot be modified, and within these
restrictions EQUAL becomes a well-defined and useful operation.
This tension between the simplicity of EQ and the usefulness of EQUAL has
led to a great deal of confusion. This confusion has now lasted for 30
years, with additional confusion having been recently added by Common Lisp.
Since neither EQ nor EQUAL has the desired semantics for the multiplicity of
Common Lisp datatypes, Common Lisp added six more--EQL, EQUALP, =, CHAR=,
STRING=, and STRING-EQUAL, yet these also lack the correct semantics.
Scheme continues this confusion by offering the three predicates (eq?, eqv?
and equal?) which are roughly equivalent to Common Lisp's EQ, EQL and
In other words, if you have mutable cons cells that are almost always treated as immutable, you need both eq and equal. But if you create two datatypes: mutable cons and immutable cons, then you only need "egal". With immutable conses, you don't want to use "eq" on them because they're functional: letting you grab the reference of an immutable cons would be an abstraction leak.
But with mutable conses, you do need "eq", because two cons cells that are equal may not be eq. So, if your language distinguishes between mutable and immutable conses, you only need "egal". This not only prevents abstraction leaks, but it also preserves operational equivalence, which means that if "egal" says that two objects are equal, then it's impossible to distinguish them in any way. This is not true with eq/equal.
As a practical example of why you would want that... look at Racket. It has three kinds of hash tables: hash-eq, hash-equal, and hash-eqv. And it has all kinds of issues if you use a mutable object as a key in an "equal" hash:
Caveat concerning mutable keys: If a key in an equal?-based hash table is
mutated (e.g., a key string is modified with string-set!), then the hash
table’s behavior for insertion and lookup operations becomes unpredictable.
So you, the programmer, have to decide every time you create a hash table which kind of equality it should use. And if you choose wrong, you end up with issues. And it also means you can't mix mutable and immutable keys in a hash table. This is ridiculous. With "egal" this isn't a problem.
Notice that none of this has to do with performance. It has to do with the conceptual model of how objects behave.
Thanks for all that detail! Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy.
I think the crux is this: "..it is possible to distinguish two mutable objects by mutating one of them and checking to see if the same change occurs in the other object."
This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality.
However I have a lot more respect after your comment for how egal can simplify lots of thorny concurrency issues.
This is a matter where Pauan and I agree on the ideal semantics completely. I seem to remember us advocating egal a few other times (though I don't think I've ever called it "egal").
That said, I don't care if this functionality is provided as a two-argument function or as (addr ...).
"Yeah, using lists and hashes as hash keys is kinda crazy. And mutating such hash keys is utterly crazy."
I built Cairntaker on the idealistic notion that every single first-class value is a mutable or immutable weak table. That includes the table keys. Cairntaker does make compromises for performance and FFI, but none that betray this ideal.
However, I can't find any more than that. Maybe I just took it for granted at that point.
It is the policy I applied in Cairntaker. Cairntaker interns all immutable tables so they can be compared for deep equality even after some of their entries have been garbage-collected. Equality support is essential in Cairntaker since any value can be used as a table key.
Lol, yep. The way I "implement" immutable lists in practical programming is by not modifying them. I rationalize that their static type would enforce this if only there were a type system. :-p
In Arc I can't really rely on using lists as keys, since (IIRC) Jarc converts table keys to strings using the same behavior as 'write. However, in practice I have used lists of alphanumeric symbols, since those have a predictable 'write representation on Jarc, a predictable Racket 'equal? behavior on pg-Arc, etc.
"This still feels like a crazy abstraction. If two mutable objects look the same I want equality to say they are the same by default, not go into contortions to cover its ass because I might choose to mutate them and I might then care about reference equality."
Then make the objects immutable. Then equal makes sense. Then the compiler/interpreter knows what your intent is. Then you avoid all these issues. Then you don't even need a distinction between eq/equal: you can just use egal.
This seems to me to be the obvious choice, but it seems we disagree.
Yeah, I didn't fully understand egal when I said I agreed with it if the language distinguishes immutable objects. I want my equality to be more lenient in comparing mutable objects. Even with mutable objects it seems a lot more common that I just care about structural equality rather than identity.
"Does Nulan give access to 'write? That would distinguish them."
Not if I change printing to always use floating point, or print exact floating points as integers. :P I actually did something like that with Arc/Nu: I was tired of typing in (/ 1 5) and getting back 1/5, so I changed Arc/Nu to print exact numbers as if they were inexact, so it would print 0.2. This only affects the printing, not the actual math stuff.
"I don't know about that. What happens with very large, but exact, integers?"
I dunno, what happens in Racket?
Actually, this is going a bit off topic, since it doesn't especially matter to me whether exact integers are a subset of inexact numbers. I thought I even deleted this part of my reply, but oh well. XD
Looks like the racket docs say Racket's inexact numbers are IEEE floating-point numbers of single or double precision, with double being the default. So they don't preserve integers beyond 53 bits:
I "solved" this issue by simply making all numbers 64-bit floating point. The change was really easy to make. I didn't do it because of this issue, though. I did it because I want to eventually compile to JS, and JS only has 64-bit floating point.
Though, it also makes it easier conceptually to only have to worry about floating point, as compared to floating point + integer + rational + complex + whatever else Racket has. Even if you divide numbers into exact/inexact like Racket does, that's still more complicated than only having a single numeric type.