Introduction to Hash Table

Things to be discussed

  • Introduction
  • Hash Table
    • Hash Function
    • Strengths
    • Weaknesses
    • Collision Problem (Next Article)
    • Dynamic Array Resizing (Next Article)
    • Applications (Next Article)
  • Implementation (Next Article)
  • Practice Problems (Next Article)

Introduction

If you have programming experience and don't know what a Hash Table is, chances are you use them regularly without realizing it. They are so useful and practical that many programming languages have a hash table construct built into the syntax.

They are referred to as associative arrays (PHP) or dictionaries (Python) or hashes (PERL). By creating your hash table, you can tailor it to your specific needs with optimal efficiency. 

In this article, I'll try to provide a basic understanding of hash tables. In the end, I'll provide a full implementation of a hash table in C++.

“If I were stranded on a desert island and could only take one data structure with me, it would be a hash table.” - Peter Van Der Linden, Author of Deep C Secrets

Hash table

So, what is a hash table? Think of it as arrays on steroids. It’s a versatile data structure that can solve many kinds of problems, and it can also store a huge variety of data types and quantities super efficiently.


A hash table is made up of two different parts: An array (which you're already familiar with) and a hash function (which is probably a new concept!). So why these parts? The array is what holds our data and the hash function decides where exactly our data goes in the array - and how we'll get it out when the time comes.


Hashing is a technique that is used to uniquely identify a specific object from a group of similar objects. Some examples of how hashing is used in our lives include:

  • In universities, each student is assigned a unique roll number that can be used to retrieve information about them.
  • In libraries, each book is assigned a unique number that can be used to determine information about the book, such as its exact position in the library or the users it has been issued to etc.

In both these examples, the students and books were hashed to a unique number.

More formally, the Hash table is a data structure that represents data in the form of key-value pairs. Each key is mapped to a value in the hash table. The keys are used for indexing the values/data.


Hash function

Any function that can be used to map a dataset of arbitrary size to a dataset of a fixed size that falls into the hash table is called a hash function. The values returned by a hash function are called hash values, hash codes, hash sums, or simply hashes.

To achieve a good hashing mechanism, it is important to have a good hash function with the following basic requirements:

  1. Easy to compute: It should be easy to compute and must not become an algorithm.
  2. Uniform distribution: It should provide a uniform distribution across the hash table and should not result in clustering.
  3. Fewer collisions: Collisions occur when pairs of elements are mapped to the same hash value. These should be avoided.

Note: Irrespective of how good a hash function is, collisions are bound to occur. Therefore, to maintain the performance of a hash table, it is important to manage collisions through various collision resolution techniques.

Operations

Average Case

Worst Case

space

O(n)

O(n)

insert

O(1)

O(n)

lookup

O(1)

O(n)

delete

O(1)

O(n)

Strengths

  • Fast lookups. Lookups take O(1) time on average.
  • Flexible keys. Most data types can be used for keys if they're hashable.

Weaknesses

  • Slow worst-case lookups. Lookups take O(n) time in the worst case.
  • Unordered. Keys aren't stored in a special order. If you're looking for the smallest key, the largest key, or all the keys in a range, you'll need to look through every key to find it.
  • Single-directional lookups. While you can look up the value for a given key in O(1) time, looking up the keys for a given value requires looping through the whole dataset - O(n) time.
  • Not cache-friendly. Many hash table implementations use linked lists, which don't put data next to each other in memory.

Conclusion

In this article we have learnt about hash tables in general, what is a hash function, time complexity of a hash table if implemented, what are their advantages and disadvantages, etc. 

Hash tables are very useful when dealing with large amounts of data but something called as a collision will happen when multiple values have the same key. We'll discuss collisions and how to resolve them in our next article.


The Related Articles which you should read after this


If there is any error or you have any doubt in the article then do ask us through the comment, we will be happy to help you.


If you think that this article has helped you to learn something then do share this with your friends.

Thank You for Reading.