Tales from the Evil Empire

Bertrand Le Roy's blog

When (not) to override Equals?

In .NET, you can override operators as well as the default implementation of the Equals method. While this looks like a nice feature (it is if you know what you're doing), you should be very careful because it can have unexpected repercussions. First, read this. Then this.
One unexpected effect of overriding Equals is that if you do it, you should also override GetHashCode, if only because the Hashtable implementation relies on both being in sync for the objects used as the keys.
Your implementation should respect three rules:
  1. Two objects for which Equals returns true should have the same hash code.
  2. The hashcode distribution for instances of a class should be random.
  3. If you get a hash code for an object and modify the object's properties, the hash code should remain the same (just as the song).
While the first requirement ensures consistency if your class instances are used as the key in a hashtable, the second ensures good performance of the hashtable.
The third requirement has an annoying consequence: the properties that you use to compute the hash must be immutable (ie, they must be set from a constructor only and be impossible to set at any time after that).
So what should you do if your Equals implementation involves mutable properties? Well, you could exclude these from the computation of the hash and only take into account the immutable ones, but doing so, you're destroying requirement number 2.
The answer is that you should actually never override Equals on a mutable type. You should instead create a ContentsEquals (or whatever name you may choose) method to compare the instances and leave Equals do its default reference comparison. Don't touch GetHashCode in this case.
 
Update: It may seem reasonable to say that it's ok to override Equals and GetHashCode on mutable objects if you document clearly that once the object has been used as the key in a hashtable, it should not be changed and that if it is, unpredictable things can happen. The problem with that, though, is that it's not easily discoverable (documentation only). Thus, it is better to avoid overriding them altogether on mutable objects.

Comments

Sigurdur G. Gunnarsson said:

Hi,

This is something that has been troubling me for awhile. While most of this sounds right to me, there are some things that I don't like.
I use quite a lot of custom reference types, and some are mutable. I use them in collections and want to be able to sort them. I do that by implementing IComparable and the collection classes sort them via that interface.
Well, the rules state that if you implement IComparable you must implement Equals and the operators ==, !=, < and >.
Doing all that I end up with reference types that don't match up with the rules you state.

Am I missing something?
# December 16, 2004 5:41 AM

Bertrand Le Roy said:

Sigurdur, that's a very good question. For value types, well, they should really be immutable anyway, but for reference types that must be IComparable, in short, for the moment, I don't know exactly. I've asked the question to some CLR gurus and I'm waiting for an answer. Stay tuned.
What I can recommend for the moment is that you base your implementations of IComparable, Equals and GetHashCode on immutable properties if you can (even if the rest of the type is not immutable).
As a last resort solution, if you don't need to use these types as keys in a hash table (which is the main reason to use the hash), you could throw from GetHashCode.
# December 16, 2004 2:15 PM

Bertrand Le Roy said:

OK, so the gurus agree that IComparable should really only be applied to immutable objects. So if you implement it on mutable objects, you must be breaking something. What you're breaking depends on what requirement you sacrificed in your implementation of Equals and GetHashCode. The documentation may be updated in the future to reflect that.
One other thing you could do is provide a method that enables the locking of the objects before you use them as keys in a collection object where the hash and Equals must be in sync (for example Hashtable), but that is a half measure as the users of your class must know about this problem and agree to lock the objects before using them as keys. Throwing from GetHashCode if the object is not locked could help prevent the problem.
Now, doing that is not ideal as it kind of breaks the pattern whatever you do: GetHashCode is implemented in object and it's not supposed to throw.
So the advice is "don't" and expect unexpected consequences if you do.
# December 18, 2004 6:19 PM

Sigurdur G. Gunnarsson said:

Thank you, that clarifies things quite a bit. And yes it would help if the documentation stated that "IComparable should really only be applied to immutable objects".
# December 21, 2004 6:13 AM

Ryan O'Connor said:

This is one of the most useful pages I've found on overriding equals and gethashcode. It clears up so much, and applies equally to Java too! Thanks!
# June 14, 2006 6:08 AM

Alexis Kennedy said:

I second Ryan's comment - you've cleared a gathering headache, thanks.
# July 29, 2006 8:46 AM

ajoka@nextra.sk said:

Ok, i've got 2 problems: 1 is the MSDN documentation, 2 is the result from some test.

1) FROM MSDN:

A hash function must have the following properties:

a)If two objects of the same type represent the same value, the hash function must return the same constant value for either object.

b)For the best performance, a hash function must generate a random distribution for all input.

c)The hash function must return exactly the same value regardless of any changes that are made to the object.

Please check this, i've got a big mess in it:

a) means that if all of my member variables (even mutable ones) are the same, hash-codes must equal

b) says that hash code should remain the same for all the object lifetime.

SO HOW IT IS?

2) Ive did some tests, created an own simple class with 1 value type member. implemented gethashcode and equals by using the member. created 2 instances of the same class (with id 1, 2) - added this class to an ArrayList, then changed the id value of the instance of the first class. the result: both object remained in the list, even so that hashcode has changed for the 1st object.

SO AGAIN, there must be something that's missing.

Another stuff: in a java article i've read that hashcode should remain the same for the object lifetime (similarity to point 1.c). But when i created a byte[] array set vlues for them, then each value modification resulted in a new hashcode...

Can anyone descibe it in short and clear way what to do with equals and gethashcode - and also describe some risk factor for each special case?

thanx.

# July 28, 2007 3:17 PM

Bertrand Le Roy said:

Ajoka: those rules really apply to value types and similar entities. All your examples are pure reference types for which equality and hash codes are associated to the reference, not to the state, and should not be overridden under normal circumstances.

# July 29, 2007 7:44 PM

K. Scott Allen said:

Given this simple Employee class: public class Employee { public int ID...

# March 25, 2008 10:24 PM

BusinessRx Reading List said:

Given this simple Employee class: public class Employee { public int ID { get ; set ; } public string

# March 25, 2008 10:27 PM

.Net World said:

Given this simple Employee class: public class Employee { public int ID { get ; set ; } public string

# April 15, 2008 1:45 PM
Leave a Comment

(required) 

(required) 

(optional)

(required)