diff options
Diffstat (limited to '04-20.md')
-rw-r--r-- | 04-20.md | 57 |
1 files changed, 57 insertions, 0 deletions
@@ -133,3 +133,60 @@ In practice, we don't know the keys in advance, it is very hard to derive a perf - Even if we choose a good hash function that does a good job distributing the keys across the entire hash table, we will still have collisions in practice and need to deal with them - The next step will be to cover **Collision Resolutions**... + +## Probing + +- A simple way of resolving collisions: **probing** +- If a key 'y' collids with a key 'x' that is already in the table, 'x' is **not** displaced. Rather 'y' must be placed somewhere else +- In probing, the colliding key tries another slot in the table (**probes**), if that slot is empty, then is is placed there, otherwise it keeps trying (**probing**) different slots +- Probing methods covered in this class: + - Linear Probing + - Quadratic Probing + - Double Hashing + +## Linear Probing + +- We can think of our hash function as having an extra parameter `h(k, i)`, where 'i' is the **probe number** that starts at zero for each key (i.e., it's the number of the try) +- Simplest type of probing: + - `h(k, i) = (h(k) + i) % m` + - This is called **linear probing** + +## Linear Probing (Insertion) + +- If i=0, then `h(k, i) = h(k)`, i.e., the "home" location of the key. If that is occupied, we use the next location to the right, wrapping around to the front of the array +- Example, for m=11, let's have number between 0 and 100 and map them to the array + - k=20 -> go to index 9 + - k=40 -> go to index 4 + - k=15 -> go to index 4 (but that location is already taken by k=40) + - i++ -> go to index 5 + +- Another example: + - m = 9 + - k = 0, 9, 18, 27, 36, 45, 54 + - k=0 -> go to index 0 + - k=9 -> go to index 0 (but it's taken) + - i++ -> go to index 1 + - k=18 -> go to index 0 (but it's taken) + - i++ -> go to index 1 (but it's taken) + - i++ -> go to index 2 + - k=27 -> go to index 0 (but it's taken) + - i++ -> go to index 1 (but it's taken) + - i++ -> go to index 2 (but it's taken) + - i++ -> go to index 3 + - If all keys map to the same location -> BAD! + - What is the big-O runtime for inserting a new key (assume there are already 'n' keys in the table) + - O(n) (pretty bad) + +- In the **worst case**, if all keys hash to the same location, insertion is **O(n)** +- General case **linear probing**: + - `h(k, i) = (h(k) + c*i) % m` + - e.g. c=2, m=7, `h(k, i) = (h(k) + 2*i) % m`, k=0,2,7 + - k=0 -> go to index 0 + - k=2 -> go to index 2 + - k=7 -> go to index 0 (but it's taken) + - i++ -> go to index 2 (but it's taken) + - i++ -> go to index 4 + +--- + +[04/22 ->](04-22.md) |