Why Should the Length of Your Hash Table Be a Prime Number?

Every thorough data structures and algorithms course will cover the hash table data structure and, by extension, hash functions. In reviewing data structures recently, I came across the notion of reducing collisions by making the length of your hash table a prime number. Due to the limited scope of the course, the author did not go into much detail as to why this works and encouraged some self-research if so inclined. It turns out I am so inclined and I wanted to get to the bottom of this seemingly magic fix. To provide a little context, we will first briefly go over hash tables, hash functions, what qualities make a good hash function, and finally how a hash table of prime number length reduces collisions.

Hash Tables

With most languages featuring a built-in version of a hash table, they are an extremely useful and common data structure. They are known as dictionaries in Python, objects in JavaScript, Maps in Java, Go, and Scala, and hashes in Ruby. Hash tables are primarily used to store data in key-value pairs. With the ability to quickly locate data using its associated key, hash tables are an excellent option for data access, insertion, and removal. This is a marked improvement over arrays that, while providing quick access using indices, can have costly time complexities when adding and removing elements.

In most cases, utilizing a language’s built-in hash function is probably the best option, however, they can be modeled from scratch using an array. In this case, we would provide the key corresponding to the data we wish to access. This key must be transformed into an index where the key-value pair is stored and then using the index the desired data is returned. This is where hash functions come in to play.

hash functions

In general, hash functions take an input of any size and return an output of a fixed size; it could be a short string or an integer. These functions are ‘one-way’ meaning we cannot construct the original input by working backward from the output. As a result, hash functions are often used in cryptography.

To illustrate, let’s say we are using a has table to store data relating to a collection of books with keys corresponding to the books’ ISBNs and the values of the books’ title. Our hash function would take the ISBN as an argument and return an index in which the data related to that ISBN could be found. Using this index we can look up and return the book title.

#algorithms #data-structures #hash-table #javascript #hash-function #function

Hash Tables

hash functions

medium.com

Why Should the Length of Your Hash Table Be a Prime Number?