Introduction

In the previous tutorial, we talked about constants.  By using a constant, you are letting everyone else know that a given value will not change. Constants can be applied to variables, function parameters and functions themselves.

 

In this tutorial, we are going to talk about hash tables, specifically the STL unordered_map.  I like to think of hash tables as a more general form of an array.  Arrays are indexed by small integers, whereas hash tables can be indexed by anything.

Code

Let’s start out with a quick example:

 

#include <unordered_map>

#include <string>

 

using namespace std;


 

int main()

{

unordered_map<string, string> hashtable;

hashtable.emplace("www.element14.com", "184.51.49.225");

 

cout << "IP Address: " << hashtable["www.element14.com"] << endl;

 

return 0;

}


In this example, we create a hash table, which is called an unordered_map in c++.  The first type within the angled brackets is the key – which is analogous to the index of an array.  The second type within the angled brackets is the value – which is analogous to the value stored in an array.  In this example, both the key and the value are strings.  In the hash table, we are going to store the url of a website and the IP address that it resolves to.  So, the next line adds "www.element14.com" to the hash table.  The word “emplace” means “to put into place or position”.  Now comes the exciting part, to access the information in the hash table, we give it the key ("www.element14.com") and it gives us the value ("184.51.49.225").  This is similar to how an array works, where we give it an index (i.e. 0) and it gives us back a value (i.e.  "184.51.49.225").

 

The unordered_map also has an insert function, but it requires slightly more code to use:

 

 

hashtable.insert(make_pair("www.newark.com", "184.51.50.121"));

 

This is because the insert function is expecting a key/value pair that has already been created (using make_pair), whereas the emplace function creates the key/value pair for you.

 

To display all of the data in the hash table, do something like this:

 

 

for (auto itr = hashtable.begin(); itr != hashtable.end(); itr++)

{

cout << (*itr).first << ": " << (*itr).second << endl;

}

 

This should look familiar to the way that the STL list/vectors worked.  There is also a condensed version:

 

for (auto &itr : hashtable)

{

cout << itr.first << ": " << itr.second << endl;

}

 

Which saves some typing.

Under the Covers

Hash tables are implemented by converting the key object into a number.  This number then points to a bucket where the value is stored.  Once the correct bucket is found, every item in the bucket is compared with the key, to see if it is the matching item.

 

Therefore the performance of a hash table is very dependent on the function that transforms the key into a number (referred to as the hash function).  A good hash function will randomly and uniformly distribute the given keys into buckets.  So, that the comparison between a key and other objects within the same bucket is minimized.

 

This means that hash tables are extremely fast at inserting, removing and retrieving entries.  In the average case, the time is constant, which means that they will perform very well, even with a large number of entries.

Custom Classes

The incredibly powerful part of hash tables is that anything can be a key, even custom classes. Here’s a quick example:

 

class Make

{

public:

string Name;


 

Make(string name)

{

Name = name;

}


 

bool operator==(const Make &make) const

{

return Name == make.Name;

}

};


 

class Model

{

public:

string Name;

int Year;


 

Model(string name, int year)

{

Name = name;

Year = year;

}


 

bool operator==(const Model &model) const

{

return (Name == model.Name && Year == model.Year);

}

};


 

class ModelHash

{

public:

size_t operator()(const Model &model) const

{

return hash<string>()(model.Name) ^ hash<int>()(model.Year);

}

};

 

int main()

{

unordered_map<Model, Make, ModelHash> model2make;


 

Model camry2005("Camry", 2005);

Model tercel1993("Tercel", 1993);

      

Make toyota("Toyota");

 

model2make.emplace(camry2005, toyota);

model2make.emplace(tercel1993, toyota);


 

for (auto &itr : model2make)

{

cout << itr.first.Name << " " << itr.first.Year << ": " << itr.second.Name << endl;

}

 

return 0;

}


 

First, we declare two classes: one to store the make of a car and one to store the model of a car. The ModelHash function converts the Model class into a number.  This allows the hash table to decide which bucket to use to store the object. The Model class has two pieces of information: the Name of the model and the Year that it was produced.  When we generate the number that will represent the hash table in the class, we want to use all of the variables that uniquely define the class. So, in this case we will use both the Name and the Year variables.  If you look closely, the two variables are combined using the carrot operator (^), which performs an eXclusive OR.  This is a very common trick for combining two variables in hash function.

 

The rest of the code look fairly similar to the first example.  The only weird part, is you might be wondering why we went from model to make and not the other way around.  Well, the keys for hash tables need to be unique.  (This is similar to an array.)  If we used the make of the car, it would not be unique.  To get around this, we will store a vector of Models for each Make:

 

class MakeHash

{

public:

size_t operator()(const Make &make) const

{

return hash<string>()(make.Name);

}

};

 

int main()

{

unordered_map<Make, vector<Model>, MakeHash> make2model;


 

vector<Model> toyotaModels;

toyotaModels.push_back(camry2005);


 

make2model.emplace(toyota, toyotaModels);


 

make2model[toyota].push_back(tercel1993);


 

for (auto &make : make2model)

{

cout << make.first.Name << endl;

 

vector<Model> models = make.second;


for(auto &model : models)

{

cout << " " << model.Name << " " << model.Year << endl;

}

}


 

return 0;

}

 

Now the data structure should seem more familiar where the manufacturer of the car maps to the model of cars that it produces.

Summary

In this tutorial, we learned about hash tables, which can be thought of as a more general version of an array.

 

This is our last tutorial. Next week there will be a recap of what we went over and a quiz to test your new skills and I even heard a rumor that there will be some prizes involved…

 

If you have any questions or comments about what was covered here, post them to the comments.  I watch them closely and will respond and try to help you out.

Tutorial Index

01 - Hello Raspberry Pi

02 - Reading User Input

03 - If Statement

04 - For Loops

05 - While Loops

06 - Functions

07 - Data Types

08 - Arrays

09 - Structures

10 - Classes

11 - Pointers

12 - Dynamic Arrays

13 - Linked List

14 - Linked List Operations

15 - STL List/Vector

16 - Templates

17 - Inheritance

18 - File I/O

19 - Strings

20 - Constants

21 - Hash Tables