[\<- 04/15](04-15.md) --- ## Review What We Have Learned So Far |Basic Operation|Unsorted Array|Sorted Array| |---------------|--------------|------------| |**Search Find**|O(n) |O(log(n)) | |**Insert** |O(1) |O(n) | |**Delete** |O(1) |O(n) | |**Min/Max** |O(n) |O(1) | - Search Find - Unsorted would use sequential search - Sorted would use binary search - Insert - Unsorted: you can just insert at the end - Sorted: need to determine where to insert so that the array stays ordered - Delete - Unsorted: can just swap the last entry in for the deleted entry and make the array length minus 1 - Sorted: need to shift all indices after deleted index down - Min/Max - Unsorted: Need to traverse entire array - Sorted: Min and max are the first and last indices ## Making it Faster - Let's work on making searching in an array even faster: O(1) ## Hashing - A key-to-address mapping process - Ex. - We have keys - x - y - We have address - 000 - 001 - 002 - ... - 100 - We have a function `h()` - `h(x)` is the **address** for **key** x, `h(y)` is the **address** for **key** y - to search: `a[h(x)]`... O(1) ## Outline - Q: I want to know if someone recieved a given score x on the exam. Each score should range from 0 to 100. How could I do this quickly? - A: Have an array in which each score has its own slot - What is the hashing method `h(x)`? ``` h(x) = x; ``` - It's called **direct hashing**! ## Example 2 - Let's try something else. We're reading words from a book, so let's ask if all the letters were used - Q: What if our inputs were letters? How to construct `h()`? - We have an array from index 0 to 51. The first half is for lower case letters. The second half for upper case letters - 'a' should go to index 0 - 'b' should go to index 1... etc - We will take advantage of the ASCII table ``` if('a' <= x <= 'z') h(x) = x-'a'; if('A' <= x <= 'Z') h(x) = x-'A'+26; //offset to the second half of the array ``` - The Big-O runtime of this is O(1) - This is so-called **subtraction method** ## Example 3 - Ex. What if I want to know whether someone in the university earns a certain salary? And I know that the lowest paid person is #30k and highest paid is $200k, and salaries are in increments of $1k... so... - Q: How would you map them? - There should be an array where index 0 is 30k, 1 is 31k, 2 is 32k, etc. The final index is 200k. ``` h(x) = (x - 30k)/1k ``` ## Concepts - What we're doing in these examples is trying to find a function, `h(x)`, that takes as inputs some arbitrary data and maps them onto 0 to m-1, which we can then use ar array indices - This technique is called **hashing** - The array indexed by such a function is called a **hash table** - The function itself is called the **hash function** - It turns out that if we know the keys in advance then we can always derive a **perfect hash function** (each key hashes to its own unique location). **One-to-one mapping** ## Example 4 - What if I wanted to use you ID numbers, but map them to no more than 100 slots? - Using the first digit... is terrible (will not work) - Using the last digit... a little better, but would only work with at most 10 students - Using the last two digits... this might work, but there is **hashing collision** if two students have the same last two digits - Multiply by 17, divide by 97 and take the remainder... maybe better, but no guaruntee that there won't be collision In practice, we don't know the keys in advance, it is very hard to derive a perfect hash function ## A New Hashing Method - Modulo Arithmetic - The simplest way to make sure a hash function always maps within 0..m-1 is to take the result and take the remainder fo dividing by m - Since our hash function needs to yield values between "0" and "m-1" (m is max), we usually do: `h(x) % m` - We like to pick m as a **prime number**, since it will generate less collisions (you don't have to understand why) ## Modulo Arithmetic - Let's have a simple example - m = 7 - k = 8, 14, 12, 2, 4 - `h(k) = k % m` - h(8) -> go to index 1 - h(14) -> go to index 2 - h(12) -> go to index 5 - h(2) -> go to index 2 - h(4) -> go to index 4 - If we insert 11... - h(11) -> go to index 4 - There is multiple keys mapped to the same index now -> collision :( ## Collision Resolutions - Even if we choose a good hash function that does a good job distributing the keys across the entire hash table, we will still have collisions in practice and need to deal with them - The next step will be to cover **Collision Resolutions**...