Hash Map

The HashMap class provides a map implementation that is based on a hashtable data structure. This implementation supports all Map operations, and permits null keys and null values. It makes no guarantees on the order in which entries are stored.

A hashtable maps keys to integer values with the help of a hash function. Java provides this function in the form of Object's hashCode() method, which classes override to provide appropriate hash codes.

A hash code identifies one of the hashtable's array elements, which is known as a bucket or slot. For some hashtables, the bucket may store the value that is associated with the key. Figure 8-2 illustrates this kind of hashtable.

keys keys

buckets

>0

ACCTS

-r1

SALES

jf 2

SALES

n-3

n-2

n-1

hash function Figure 8-2. A simple hashtable of buckets

The hash function hashes Bob Doe to 0, which identifies the first bucket. This bucket contains ACCTS, which is Bob Doe's employee type. The hash function also hashes John Doe and Sally Doe to 1 and 2 (respectively) whose buckets contain SALES.

A perfect hash function hashes each key to a unique integer value. However, this ideal is very difficult to meet. In practice, some keys will hash to the same integer value. This nonunique mapping is referred to as a collision.

To address collisions, most hashtables associate a linked list of entries with a bucket. Instead of containing a value, the bucket contains the address of the first node in the linked list, and each node contains one of the colliding entries. See Figure 8-3.

keys buckets entries

Bob Doe

John Doe

Sally Doe

John Doe

n-3 n-2 n-1

—>

Bob Doe

ACCTS

X

X

—►

John Doe

SALES

\

y

Sally Doe

SALES

X

£

Figure 8-3. A complex hashtable of buckets and linked lists (X indicates a null reference)

When storing a value in a hashtable, the hashtable uses the hash function to hash the key to its hash code, and then searches the appropriate linked list to see if an entry with a matching key exists. If there is an entry, its value is updated with the new value.

Otherwise, a new node is created, populated with the key and value, and appended to the list.

When retrieving a value from a hashtable, the hashtable uses the hash function to hash the key to its hash code, and then searches the appropriate linked list to see if an entry with a matching key exists. If there is an entry, its value is returned. Otherwise, the hashtable may return a special value to indicate that there is no entry, or it might throw an exception.

The number of buckets is known as the hashtable's capacity. The ratio of the number of stored entries divided by the number of buckets is known as the hashtable's load factor. Choosing the right load factor is important for balancing performance with memory use:

■ As the load factor approaches 1, the probability of collisions and the cost of handling them (by searching lengthy linked lists) increase.

■ As the load factor approaches 0, the hashtable's size in terms of number of buckets increases with little improvement in search cost.

■ For many hashtables, a load factor of 0.75 is close to optimal. This value is the default for HashMap's hashtable implementation.

HashMap supplies four constructors:

■ HashMap() creates a new, empty hashmap with an initial capacity of 16 and a load factor of 0.75.

■ HashMap(int initialCapacity) creates a new, empty hashmap with a capacity specified by initialCapacity and a load factor of 0.75. This constructor throws IllegalArgumentException when initialCapacity's value is less than 0.

■ HashMap(int initialCapacity, float loadFactor) creates a new, empty hashmap with a capacity specified by initialCapacity and a load factor specified by loadFactor. This constructor throws IllegalArgumentException when initialCapacity is less than 0 or when loadFactor is less than or equal to 0.

■ HashMap(Map<? extends K, ? extends V> map) creates a new hashmap containing map's entries. This constructor throws NullPointerException when map contains the null reference.

Listing 8-24 demonstrates a hashmap.

Listing 8-24. Using a hashmap to count command-line arguments import java.util.HashMap; import java.util.Map;

public class HashMapDemo {

public static void main(String[] {

Map<String, Integer> argMap = for (String arg: args)

args)

new HashMap<String, Integer>();

Integer count = argMap.get(arg); argMap.put(arg, (count == null) ? 1 : count+1);

System.out.println(argMap);

System.out.println("Number of distinct arguments = " + argMap.size());

HashMapDemo creates a hashmap of String keys and Integer values. Each key is one of the command-line arguments passed to this application, and its value is the number of occurrences of that argument on the command line.

For example, java HashMapDemo how much wood could a woodchuck chuck if a woodchuck could chuck wood generates the following output:

{wood=2, could=2, how=1, if=l, chuck=2, a=2, woodchuck=2, much=l} Number of distinct arguments = 8

Because the String class overrides equals() and hashCode(), Listing 8-24 can use String objects as keys in a hashmap. When you create a class whose instances are to be used as keys, you must ensure that you override both methods.

Listing 8-10 showed you that a class's overriding hashCode() method can call a reference field's hashCode() method and return its value, provided that the class declares a single reference field (and no primitive type fields).

More commonly, classes declare multiple fields, and a better implementation of the hashCode() method is required. The implementation should try to generate hash codes that minimize collisions.

There is no rule on how to best implement hashCode(), and various algorithms (recipes for accomplishing tasks) have been created. My favorite algorithm appears in Effective Java, Second Edition, by Joshua Bloch (Addison-Wesley, 2008; ISBN: 0321356683).

The following algorithm, which assumes the existence of an arbitrary class that is referred to as X, closely follows Bloch's algorithm, but is not identical:

1. Initialize int variable hashCode (the name is arbitrary) to an arbitrary nonzero integer value, such as 19. This variable is initialized to a nonzero value to ensure that it takes into account any initial fields whose hash codes are zeros. If you initialize hashCode to 0, the final hash code will be unaffected by such fields and you run the risk of increased collisions.

2. For each field f that is also used in X's equals() method, calculate f's hash code and assign it to int variable hc as follows:

b. If f is of byte integer, character, integer, or short integer type, calculate hc = (int) f. The integer value is the hash code.

c. If f is of long integer type, calculate hc = (int) (fA(f>>>32)). This expression exclusive ORs the long integer's least significant 32 bits with its most significant 32 bits.

d. If f is of type floating-point, calculate hc = Float.floatToIntBits(f). This method takes +infinity, -infinity, and NaN into account.

e. If f is of type double precision floating-point, calculate long l = Double.doubleToLongBits(f); hc = (int) (p(l>>>32)).

f. If f is a reference field with a null reference, calculate hc = 0.

g. If f is a reference field with a nonnull reference, and if X's equals() method compares the field by recursively calling equals() (as in Listing 8-17's Employee class), calculate hc = f.hashCode(). However, if equals() employs a more complex comparison, create a canonical (simplest possible) representation of the field and call hashCode() on this representation.

h. If f is an array, treat each element as a separate field by applying this algorithm recursively and combining the hc values as shown in the next step.

3. Combine hc with hashCode as follows: hashCode = hashCode*31+hc. Multiplying hashCode by 31 makes the resulting hash value dependent on the order in which fields appear in the class, which improves the hash value when a class contains multiple fields that are similar (several ints, for example). I chose 31 to be consistent with the String class's hashCode() method.

4. Return hashCode from hashCode().

TIP: Instead of using this or another algorithm to create a hash code, you might find it easier to work with the HashCodeBuilder class (see http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/builder/HashCodeBuilder.html for an explanation of this class). This class, which follows Bloch's rules, is part of the Apache Commons Lang component, which you can download from http://commons.apache.org/lang/.

In Chapter 3, Listing 3-9's Point class overrides equals() but does not override hashCode(). Listing 3-11 presents a small code fragment that must be appended to Point's main() method to demonstrate the problem of not overriding hashCode(). I restate this problem here:

Although objects p1 and Point(10, 20) are logically equivalent, these objects have different hash codes, resulting in each object referring to a different entry in the hashmap. If an object is not stored (via put()) in that entry, get() returns null.

Listing 8-25 modifies Listing 3-9's Point class by declaring a hashCode() method. This method uses the aforementioned algorithm to ensure that logically equivalent Point objects hash to the same entry.

Listing 8-25. Overriding hashCode() to return proper hash codes for Point objects import java.util.HashMap; import java.util.Map;

class Point {

private int x, y;

return x;

return y;

^Override public boolean equals(Object o) {

return false; Point p = (Point) o; return p.x == x && p.y == y;

^Override public int hashCode() {

hashCode = hashCode*31+hc;

hashCode = hashCode*31+hc; return hc;

public static void main(String[] args) {

Point pi = new Point(l0, 20); Point p2 = new Point(20, 30); Point p3 = new Point(i0, 20); // Test reflexivity

System.out.println(pi.equals(pi)); // Output: true // Test symmetry

System.out.println(pi.equals(p2)); // Output: false System.out.println(p2.equals(pi)); // Output: false // Test transitivity

System.out.println(p2.equals(p3)); // Output: false System.out.println(pi.equals(p3)); // Output: true // Test nullability

System.out.println(p1.equals(null)); // Output: false // Extra test to further prove the instanceof operator's usefulness. System.out.println(p1.equals("abc")); // Output: false Map<Point, String> map = new HashMap<Point, String>(); map.put(p1, "first point");

System.out.println(map.get(pl)); // Output: first point System.out.println(map.get(new Point(l0, 20))); // Output: null

The hashCode() method is a little verbose in that it assigns each of x and y to local variable hc, rather than directly using these fields in the hash code calculation. However, I decided to follow this approach to more closely mirror the hash code algorithm.

When you run this application, its last two lines of output are of the most interest. Instead of presenting first point followed by null on two separate lines, the application now correctly presents first point followed by first point on these lines.

NOTE: LinkedHashMap is a subclass of HashMap that uses a linked list to store its entries. As a result, LinkedHashMap's iterator returns entries in the order in which they were inserted. For example, if Listing 8-24 had specified Map<String, Integer> argMap = new LinkedHashMap<String, Integer>();, the application's output for java HashMapDemo how much wood could a woodchuck chuck if a woodchuck could chuck wood would have been {how=l, much=l, wood=2, could=2, a=2, woodchuck=2, chuck=2, if=l} followed by Number of distinct arguments = 8.

Was this article helpful?

0 0

Post a comment