Hash Table Implementation and Problems

Things to be discussed

  • Introduction (Previous Article)

  • Hash Table (Previous Article)

    • Hash Function (Previous Article)

    • Strengths (Previous Article)

    • Weaknesses (Previous Article)

    • Collision Problem

    • Dynamic Array Resizing

    • Applications

  • Implementation

  • Practice Problems

In the previous article, we introduced the hash table, hash function and we saw how effectively we can manage the data using hash tables. In this article, we'll see what is collision problem (which was introduced in the previous article) and how to effectively tackle if such a problem arises.

Collision problem

Let’s just take a quick pause to think about hash functions for a second - since we’re converting data of any length x into data of fixed size y, dataset x will be way larger than dataset y. There are infinite entries in dataset x, but limited ones in dataset y because all the numbers must be a certain size or length. In other words, there could be two or more pieces of data in dataset x that will hash to the same integer in dataset y.

This is called the collision problem, and here are some ways to deal with it:

  1. Separate Chaining

  2. Linear probing

  3. Quadratic Probing

  4. Double Hashing

  • Separate Chaining


When two or more data collide and hash to the same index, we could simply link the data up into a linked list at that index of the array, as you can see above.

Of course, this is the kind of situation we’re going to want to avoid because instead of using constant time O(1) to find a key and its corresponding information in the array, we end up needing O(n) time to walk through the linked lists to find our actual values.

  • Linear Probing


Another way to deal with collisions is to walk down the array from the collided index one by one until you get to the next free slot.

Quadratic Probing

Quadratic probing is like linear probing and the only difference is instead of going one by one we follow an arbitrary polynomial to get to the next unoccupied index.

Double hashing

Double hashing is like linear probing and the only difference is the interval between successive jumps. Here, the interval between jumps is computed by using two hash functions.

Dynamic array resizing

Suppose we keep adding more items to our hash map. As the number of keys and values in our hash map exceeds the number of indices in the underlying array, hash collisions become inevitable.

To mitigate this, we could expand our underlying array whenever things start to get crowded. That requires allocating a larger array and rehashing all our existing keys to figure out their new position in O(n) time.

Applications

Hash tables are implemented where

  • Constant time lookup and insertion is required.

  • Cryptographic applications are used.

  • Indexing data is required.

For example,

  • Associative arrays: Hash tables are commonly used to implement many types of in-memory tables. They are used to implement associative arrays (arrays whose indices are arbitrary strings or other complicated objects).

  • Database indexing: Hash tables may also be used as disk-based data structures and database indices (such as in DBM).

  • Caches: Hash tables can be used to implement caches i.e. auxiliary data tables that are used to speed up access to data, which is primarily stored in slower media.

  • Object representation: Several dynamic languages, such as Perl, Python, JavaScript, and Ruby use hash tables to implement objects.

  • Hash Functions are used in various algorithms to make their computing faster.


Implementation in C++

The implementation of Hash Table using separate chaining to resolve collisions is given here,


    #include <iostream>
    #include <math.h> // floor()
#include <string>
using namespace std;

struct Node {
int key; // number
string value; // value
Node *next; // pointer to remember memory address of next node

Node() : key(0), value(""), next(0){};
Node(int Key, string Value) : key(Key), value(Value), next(0){};
Node(Node const &data)
: key(data.key), value(data.value), next(data.next){};
};

class HashTable {
private:
int size, // size: size of table, count: number of data
count; // count/size = load factor
Node **table; // pointer to pointer, hash table

int hashFunction(int key); // Multiplication method
void tableDoubling();
void tableShrinking();
void reHashing(int size_orig);

public:
HashTable(){};
HashTable(int m) : size(m), count(0) {
table = new Node *[size]; // allocate the first demension of table
for (int i = 0; i < size; i++)
table[i] = 0; // ensure every slot points to NULL
}
~HashTable();

void Insert(Node data); // consider tableDoubling()
void Delete(int key); // consider tableShrinking()
string Search(int key);
void displayTable();
};

void HashTable::Insert(Node data) {
count++;
if (count > size) { // consider load factor
tableDoubling(); // if n/m > 1, then double the size of table
}

int index = hashFunction(data.key); // get index of slot
Node *newNode = new Node(data); // create new node to store data

// push_front()
if (table[index] == NULL) { // eg: list: (empty), add4
table[index] = newNode; // eg: list: 4->NULL
} else { // eg: list: 5->9->NULL , add 4
Node *next = table[index]->next; // list: 5->4->9->NULL
table[index]->next = newNode;
newNode->next = next;
}
}

void HashTable::Delete(int key) {
int index = hashFunction(key); // get index of slot
Node *current = table[index], // use two pointer for traversal in list
*previous = NULL;

while (current != NULL && current->key != key) {
previous = current; // traversal in list, 3 cases:
current = current->next; // 1. data not found
} // 2. data found at first node in list
// 3. data found at other position in list

if (current == NULL) { // eg: list:5->2->9->NULL, want to delete 3
cout << "Data not found!\n\n";
return;
} else {
if (previous == NULL) { // eg: list:5->2->9->NULL, want to delete 5
table[index] =
current->next; // after deleting 5, list:2->9->NULL
} // current points to 5

else { // eg: list:5->2->9->NULL, want to delete 2
previous->next =
current->next; // after deleting 2, list:5->9->NULL
} // current points to 2
delete current;
current = 0;
}

count--;
if (count < size / 4) { // consider load factor
tableShrinking(); // if n/m < 4, then shrink the table
}
}

string HashTable::Search(int key) {
int index = hashFunction(key); // get index of slot
Node *current =
table[index]; // current points to the first node in list

while (current != NULL) { // traversal in list
if (current->key == key)
return current->value;
current = current->next;
}
return "\nNo such data!";
}

int HashTable::hashFunction(int key) {
// Multiplication method
double A = 0.6180339887, frac = key * A - floor(key * A);
return floor(this->size * frac);
}

void HashTable::tableDoubling() {
int size_orig = size; // size_orig represents the original size of table
size *= 2; // double the size of table
reHashing(size_orig); // create new table with new larger size
}

void HashTable::tableShrinking() {
int size_orig = size; // size_orig represents the original size of table
size /= 2; // shrink the size of table
reHashing(size_orig); // create new table with new smaller size
}

void HashTable::reHashing(int size_orig) {
Node **newtable = new Node *[size]; // allocate memory for new table
for (int i = 0; i < size; i++) { // initializetion
newtable[i] = 0; // ensure every node in slot points to NULL
}

for (int i = 0; i < size_orig;
i++) { // visit every node in the original table

Node *curr_orig =
table[i], // curr_orig: current node in original table
*prev_orig = NULL; // prev_orig: following curr_orig

while (curr_orig !=
NULL) { // traversal in list of each slot in original table
prev_orig = curr_orig->next; // curr_orig will be directly move
// to new table need prev_orig to
// keep pointer in original table

int index = hashFunction(
curr_orig->key); // get index of slot in new table

// push_front(), do not allocate new memory space for data
// directly move node in original table to new table
if (newtable[index] == NULL) { // means newtable[index] is empty
newtable[index] = curr_orig;
newtable[index]->next =
0; // equivalent to curr_orig->next = 0;
}
// if there is no initialization for newtable, segmentation
// faults might happen because newtable[index] might not point
// to NULL but newtable[index] is empty
else { // if newtable[index] is not empty
Node *next = newtable[index]->next; // push_front()
newtable[index]->next = curr_orig;
curr_orig->next = next;
}
curr_orig =
prev_orig; // visit the next node in list in original table
}
}
delete[] table; // release memory of original table
this->table = newtable; // point table of object to new table
}

HashTable::~HashTable() {
for (int i = 0; i < size; i++) { // visit every node in table and
// release the memory of each node
Node *current = table[i]; // point *current to first node in list
while (current != NULL) { // traversal in list
Node *previous = current;
current = current->next;
delete previous;
previous = 0;
}
}
delete[] table;
}

void HashTable::displayTable() {
for (int i = 0; i < size; i++) { // visit every node in table
cout << "Slot " << i << ": ";
Node *current = table[i];
while (current != NULL) {
cout << "(" << current->key << "," << current->value << ") ";
current = current->next;
}
cout << endl;
}
cout << endl;
}

int main() {
HashTable hash(2);

hash.Insert(Node(12, "A"));
hash.Insert(Node(592, "B"));
cout << "After inserting key(12),key(592):\n";
hash.displayTable();
hash.Insert(Node(6594, "C")); // evoke tableDoubling()
cout << "After inserting key(6594), evoke tableDoubling():\n";
hash.displayTable();
hash.Insert(Node(7, "D"));
cout << "After inserting key(7):\n";
hash.displayTable();
hash.Insert(Node(123596, "E")); // evoke tableDoubling()
cout << "After inserting key(123596), evoke tableDoubling():\n";
hash.displayTable();
hash.Insert(Node(93, "F"));
hash.Insert(Node(2288, "G"));
hash.Insert(Node(793, "H"));
cout << "After inserting key(93),key(2288),key(793):\n";
hash.displayTable();
hash.Insert(Node(8491, "I")); // evoke tableDoubling()
cout << "After inserting key(8491), evoke tableDoubling():\n";
hash.displayTable();
hash.Insert(Node(323359, "J"));
cout << "After inserting key(323359):\n";
hash.displayTable();

cout << "Searching: value(8491) is " << hash.Search(8491) << ".\n\n";
cout << "Searching: value(7) is " << hash.Search(7) << ".\n\n";

hash.Delete(7);
cout << "After deleting key(7):\n";
cout << "Searching: value(7) is " << hash.Search(7) << ".\n\n";

hash.Delete(592);
cout << "After deleting key(592):\n";
hash.displayTable();

cout << "Want to delete key(592) again:\n";
hash.Delete(592);

hash.Delete(123596);
hash.Delete(323359);
hash.Delete(793);
hash.Delete(93);
cout << "After deleting key(123596),key(323359),key(793),key(93):\n";
hash.displayTable();

hash.Delete(6594); // evoke tableShrinking()
cout << "After deleting key(6594), evoke tableShrinking():\n";
hash.displayTable();

return 0;
}


Practice Problems

  1. Bear and Three Musketeers (574B)

  2. Gargari and Bishops (463C)

  3. King's Path (242C)

  4. Train and Peter (8A)

  5. Correcting Mistakes (533E)


Conclusion

In this article, we have learnt about collision problem and how effectively we can tackle it using various hashing techniques like quadratic hashing, etc. Then we saw the places where hash tables are used and then we implemented separate chaining using C++ and then some practice problems were given for your practice.

If there is any error or you have any doubt in the article then do ask us through the comment, we will be happy to help you.


If you think that this article has helped you to learn something then do share this with your friends.

Thank You for Reading