Wednesday, February 08, 2012

Implementing equals and hash

All about equals() and hashCode()

Java does not provide direct support for associative arrays -- arrays that can take any object as an index. In Java we have Object class has two methods for making inferences about an object's identity: equals() and hashCode()
HashMap helps to lookup based on object as index, other hash-based data structures such as HashSet, LinkedHashSet, HashMap, Hashtable, WeakHashMap does the same thing.

There are two approaches to defining equality and hash value: identity-based, which is the default provided by Object, and state-based, which requires overriding both equals() and hashCode(). If an object's hash value can change when its state changes, be sure you don't allow its state to change while it is being used as a hash key.

There are some restrictions placed on the behavior of equals() and hashCode(), as explained in JavaDoc of Object class.
In particular, the equals() method must exhibit the following properties:
Symmetry: For two references, a and b, a.equals(b) if and only if b.equals(a)
Reflexivity: For all non-null references, a.equals(a)
Transitivity: If a.equals(b) and b.equals(c), then a.equals(c)
Consistency with hashCode(): Two equal objects must have the same hashCode() value

Here are some rules to follow:
    - if a class overrides equals, it must override hashCode
    - equals and hashCode must use the same set of fields
    - if two objects are equal, then their hashCode values must be equal as well
    - Consistency with the equals() contract is a fundamental requirement to   every implementation of equals() . Not only the hash-based collections rely on reflexivity, symmetry, and transitivity, but everybody who calls equals()  will expect exactly this behavior. Failure to comply to the equals()  contract leads to subtle bugs that are difficult to track down, because they are conceptual problems.
    - The value that's likely to change or unique should be first to compare (like Id etc...)
    - Make sure hashCode() of the key objects that you put into the collection never changes while the object is in the collection or make it immutable
    - if the object is immutable, then hashCode is a candidate for caching and  lazy initialization
Caching of hashCode value is useless with modren JVMs.The last one in the above proved to be wrong after my experiments, JVM seems to be intelligent enough to make it fast.

Morale of the story : If you think you can optimize some obvious stuff,don't do that & JVM might be already doing that.

Generally its assumed that hashCode provides a unique identifier for an object. But actually it does not.
According to The java.lang.Object documentation it should be perfectly ok to always return 0 for the hashCode(). The positive effect of implementing hashCode() to return unique numbers with the use prime number for unique objects, is that it might increase performance. The downside is that the behavior of hashCode() must be consistent with equals(). For object a and b, if a.equals(b) is true, than a.hashCode() == b.hashCode() must be true. But if a.equals(b) returns false, a.hashCode() == b.hashCode() may still be true. Implementing hashCode() as 'return 0' meets these criteria, but it will be extremely inefficient in Hash based collection such as a HashSet or HashMap.

Recently we had one problem with the old code, where in the equals was utilizing the hashCode() method in the equals(), but since some of the string values were getting the same hashCode(), the hashmap was giving unexpected results.
assert 2627 == "RU".hashCode()
assert 2627 == "S6".hashCode()
So its always better to be careful about this as it can manifest itself after a long time

I was looking into code generated by 3 dominant IDEs in java, I thought its interesting to share some of the information. Its safest way to use IDEs to generate code from code review & consistency purpose. I don't see anything wrong with generated code here.


For me IntelliJ is doing better here (anyway nothing wrong with others).
- provides a option to mark certain fields non-null and hence null check can be avoided which I think is a good thing to do.
- makes use of instanceof others uses getClass(), All classes in the hierarchy either allow slice comparison and use instanceof or they disallow it and use getClass() . The use of 'instanceof'    if need be, it can match any supertype, and not just one class &  it renders an explict check for "that == null" redundant, since it does the check for null already - "null instanceof [type]" always returns false. (Effective Java)
- code looks compact compared to others 

There are excellent helper classes EqualsBuilder and HashCodeBuilder from the Apache Commons Langlibrary. can make life simpler, but I think code generated from IDEs are good enough


Reference:
Effective Java - Joshua Bloch chapter about equals() and hashCode()
Java theory and practice: Hashing it out
Bookmark and Share