Implementing a Hash Map in Java


Let’s begin implementing a Hash Map in Java. Our Hash Map implementation will be quite different from any of our other Data Structure implementations so far.

First declare our MyHashMap class:

public class MyHashMap<T, U> {
	
}

We’re declaring the class to have a type of T and U. This means that a new MyHashMap class is instantiated to hold keys of type T and values of type U.

What we need to do before going any further is to declare an inner class to hold our KeyValuePairs. We will call this class HashPair:

private class HashPair<R, S> {

}

The HashPair class similarly needs to be declared to have a type of R and S so it can hold Keys and Values with whatever data type is in R and S.

The class needs variables to hold both the key and the value:

	public R key;
	public S value;

The class also needs a constructor to initialize the pair. This constructor simply sets the values of the key and value variables.

public HashPair(R key, S value) {
	this.key = key;
	this.value = value;
}

The finished HashPair class looks as follows:

private class HashPair<R, S> {
	public R key;
	public S value;
	public HashPair(R key, S value) {
		this.key = key;
		this.value = value;
	}
}

Next we will declare our class level variables, but first it would be good to explain how our Hash Map will be stored internally.

It’s good to think of our hash map as storing the HashPairs in a bunch of different buckets. Each bucket can hold zero or more HashPairs. For each key a hash function can be applied to the key to generate an integer that corresponds to the bucket it is stored in. Developing a good hash function can be a difficult task (and is a topic for another day), but luckily for us we can use Java’s built in hashCode() function that is implemented for each generic object. Once we have found which bucket corresponds to a particular key we can search that bucket to find the value that it corresponds to.

Now that the internal structure of a hash map has been explained let’s begin declaring some variables.

private int maxCapacity = 1 << 30;

The maxCapacity integer refers to the maximum amount of buckets that we can have in our Hash Map. ‘1 << 30’ takes 1 and bit shifts it to the left 30 times which results in a value equal to 2 to the 30th power. So it is a fairly large value.

private int capacity = 2;

The capacity integer refers to the current amount of buckets that we have in our Hash Map. We will initially start off with 2 buckets.

private float loadFactor = 0.75f;

The loadFactor variable refers to the maximum loadFactor of the Hash Map. The loadFactor is the ratio between buckets and KeyValuePairs in a hash map. When the current loadFactor of our Hash Map becomes greater then our loadFactor variable we will increase the amount of buckets in our Hash Map. Keeping a low loadFactor reduces the amount of collisions in a Hash Map which makes Hash Map performance better. A collision in a Hash Map is when two or more keys are hashed to the same bucket.

private int size = 0;

The size integer refers to the current amount of KeyValuePairs in our Hash Map.

private List<List<HashPair<T, U>>> map;

The map variable is the variable in which we will store all of our HashPairs. It is internally stored as a List of Lists of HashPairs. Think of the first list as the buckets. Each bucket needs to store it’s own list of HashPairs so that we can handle collisions (when two or more Keys are hashed to the same bucket). In a perfect world where there are no collisions it would be possible to store it internally as a List of HashPairs.

Add a function to initialize our HashMap:

private void initHashMap() {
	map = new ArrayList<List<HashPair<T, U>>>();
	for(int i = 0; i < capacity; i++)
		map.add(null);
}

This function simply initializes our internal map variable and sets each bucket to null. This signifies that there are no KeyValuePairs in each of those buckets.
Add a constructor for our MyHashMap class:

public MyHashMap() {
	initHashMap();
	size = 0;
}

The constructor simply calls the initHashMap() function to initialize the map variable and sets the size to zero.

Add a function to return the size of the Hash Map:

public int getSize() {
	return size;
}

This simply returns the size variable.

Add a function to get which bucket a particular key is in:

private int getHashIndex(T key) {
	return key.hashCode() % capacity;
}

This function is a lot simpler then one would think. First we call key.hashCode() to get the hash value for this particular key. We then modulus (%) the result by the capacity (number of current buckets) in the Hash Map and return the result. For those unfamiliar what the modulus operator does is return the remainder from an integer division of the given numbers. The hashCode() function is used to get an integer that corresponds to the Key. We modulus the hash result by the current capacity in order to convert the result to a number from: 0 to (capacity - 1). The result has to be a number in that range to correspond to a valid bucket. A number not in that range will not have a valid bucket.

Add a function to clear the current Hash Map:

public void clear() {
	if(getSize() > 0) {
		for(int i = 0; i < capacity; i++)
			map.set(0, null);
		size = 0;
	}
}

This function simply sets each bucket in the map to null (meaning there is nothing in that bucket) and then sets the size to 0.

This makes a great stopping point, the implementation of MyHashMap will continue in the following post: Implementing a Hash Map in Java - Part Two.

Data Structures