summaryrefslogtreecommitdiff
path: root/04-20.md
diff options
context:
space:
mode:
authorlshprung <lshprung@yahoo.com>2020-04-22 10:41:03 -0700
committerlshprung <lshprung@yahoo.com>2020-04-22 10:41:03 -0700
commitb80e8d3b7de4d5bda0d20f36612279fc62d431ef (patch)
treec194fd1c0eda364e7c726ad78fec2ef068a27bf7 /04-20.md
parent183f93c9c02e10001275229536d789c5c304a1fd (diff)
Post-class 04/22
Diffstat (limited to '04-20.md')
-rw-r--r--04-20.md57
1 files changed, 57 insertions, 0 deletions
diff --git a/04-20.md b/04-20.md
index 7514236..88e0142 100644
--- a/04-20.md
+++ b/04-20.md
@@ -133,3 +133,60 @@ In practice, we don't know the keys in advance, it is very hard to derive a perf
- Even if we choose a good hash function that does a good job distributing the keys across the entire hash table, we will still have collisions in practice and need to deal with them
- The next step will be to cover **Collision Resolutions**...
+
+## Probing
+
+- A simple way of resolving collisions: **probing**
+- If a key 'y' collids with a key 'x' that is already in the table, 'x' is **not** displaced. Rather 'y' must be placed somewhere else
+- In probing, the colliding key tries another slot in the table (**probes**), if that slot is empty, then is is placed there, otherwise it keeps trying (**probing**) different slots
+- Probing methods covered in this class:
+ - Linear Probing
+ - Quadratic Probing
+ - Double Hashing
+
+## Linear Probing
+
+- We can think of our hash function as having an extra parameter `h(k, i)`, where 'i' is the **probe number** that starts at zero for each key (i.e., it's the number of the try)
+- Simplest type of probing:
+ - `h(k, i) = (h(k) + i) % m`
+ - This is called **linear probing**
+
+## Linear Probing (Insertion)
+
+- If i=0, then `h(k, i) = h(k)`, i.e., the "home" location of the key. If that is occupied, we use the next location to the right, wrapping around to the front of the array
+- Example, for m=11, let's have number between 0 and 100 and map them to the array
+ - k=20 -> go to index 9
+ - k=40 -> go to index 4
+ - k=15 -> go to index 4 (but that location is already taken by k=40)
+ - i++ -> go to index 5
+
+- Another example:
+ - m = 9
+ - k = 0, 9, 18, 27, 36, 45, 54
+ - k=0 -> go to index 0
+ - k=9 -> go to index 0 (but it's taken)
+ - i++ -> go to index 1
+ - k=18 -> go to index 0 (but it's taken)
+ - i++ -> go to index 1 (but it's taken)
+ - i++ -> go to index 2
+ - k=27 -> go to index 0 (but it's taken)
+ - i++ -> go to index 1 (but it's taken)
+ - i++ -> go to index 2 (but it's taken)
+ - i++ -> go to index 3
+ - If all keys map to the same location -> BAD!
+ - What is the big-O runtime for inserting a new key (assume there are already 'n' keys in the table)
+ - O(n) (pretty bad)
+
+- In the **worst case**, if all keys hash to the same location, insertion is **O(n)**
+- General case **linear probing**:
+ - `h(k, i) = (h(k) + c*i) % m`
+ - e.g. c=2, m=7, `h(k, i) = (h(k) + 2*i) % m`, k=0,2,7
+ - k=0 -> go to index 0
+ - k=2 -> go to index 2
+ - k=7 -> go to index 0 (but it's taken)
+ - i++ -> go to index 2 (but it's taken)
+ - i++ -> go to index 4
+
+---
+
+[04/22 ->](04-22.md)