Things to be discussed
- Introduction
- Hash Table
- Hash Function
- Strengths
- Weaknesses
- Collision Problem (Next
Article)
- Dynamic Array Resizing
(Next Article)
- Applications (Next Article)
- Implementation (Next
Article)
- Practice Problems (Next
Article)
Introduction
If you have programming experience and don't know
what a Hash Table is, chances are you use them regularly without realizing it.
They are so useful and practical that many programming languages have a hash
table construct built into the syntax.
They are referred to as associative arrays (PHP) or dictionaries (Python) or hashes (PERL). By creating your hash table, you can tailor it to your specific needs with optimal efficiency.
In this article, I'll
try to provide a basic understanding of hash tables. In the end, I'll provide a
full implementation of a hash table in C++.
“If I were stranded on a desert
island and could only take one data structure with me, it would be a hash
table.” - Peter Van Der Linden, Author of Deep C Secrets
Hash table
So, what is a hash table? Think of it as arrays on
steroids. It’s a versatile data structure that can solve many kinds of
problems, and it can also store a huge variety of data types and quantities
super efficiently.
A hash table is made up of two different parts: An
array (which you're already familiar with) and a hash function (which
is probably a new concept!). So why these parts? The array is what holds our
data and the hash function decides where exactly our data goes in the
array - and how we'll get it out when the time comes.
Hashing is a technique that is used to uniquely identify
a specific object from a group of similar objects. Some examples of how hashing
is used in our lives include:
- In universities, each student is assigned a unique roll number that can be used to retrieve
information about them.
- In libraries, each book is assigned a unique number that can be used to determine information about the book, such as its exact position in the library or the users it has been issued to etc.
In both these examples, the students and books were
hashed to a unique number.
More formally, the Hash table is a data structure
that represents data in the form of key-value pairs. Each key is mapped
to a value in the hash table. The keys are used for indexing the values/data.
Hash function
Any function that can be used to map a dataset of
arbitrary size to a dataset of a fixed size that falls into the hash table is
called a hash function. The values returned by a hash function are called hash
values, hash codes, hash sums, or simply hashes.
To achieve a good hashing mechanism, it is
important to have a good hash function with the following basic requirements:
- Easy to compute: It should
be easy to compute and must not become an algorithm.
- Uniform distribution: It
should provide a uniform distribution across the hash table and should not
result in clustering.
- Fewer collisions: Collisions occur when pairs of elements are mapped to the same hash value. These should be avoided.
Note: Irrespective of
how good a hash function is, collisions are bound to occur. Therefore, to
maintain the performance of a hash table, it is important to manage collisions
through various collision resolution techniques.
Operations |
Average Case |
Worst Case |
space |
O(n) |
O(n) |
insert |
O(1) |
O(n) |
lookup |
O(1) |
O(n) |
delete |
O(1) |
O(n) |
Strengths
- Fast lookups. Lookups take O(1)
time on average.
- Flexible keys. Most data types can be
used for keys if they're hashable.
Weaknesses
- Slow worst-case lookups. Lookups take O(n)
time in the worst case.
- Unordered. Keys aren't stored in a
special order. If you're looking for the smallest key, the largest key, or all the keys in a range, you'll need to look through every key to find it.
- Single-directional lookups. While you can look up the value
for a given key in O(1) time, looking up the keys
for a given value requires looping through the whole dataset - O(n)
time.
- Not cache-friendly. Many hash table
implementations use linked lists, which don't put data next to each other
in memory.
Conclusion
In this article we have learnt about hash tables in general, what is a hash function, time complexity of a hash table if implemented, what are their advantages and disadvantages, etc.
Hash tables are
very useful when dealing with large amounts of data but something called as a collision
will happen when multiple values have the same key. We'll discuss collisions and how to resolve them in our next article.