summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--04-22.md4
-rw-r--r--04-24.md119
2 files changed, 123 insertions, 0 deletions
diff --git a/04-22.md b/04-22.md
index 58e0510..8b401f7 100644
--- a/04-22.md
+++ b/04-22.md
@@ -191,3 +191,7 @@ Visual Representation of the Hash Table:
- Why use double hashing?
- It avoids secondary clustering
- Again, there is no guarantee we will find an empty slot, unless we pick our constants very carefully
+
+---
+
+[04/24 ->](04-24.md)
diff --git a/04-24.md b/04-24.md
new file mode 100644
index 0000000..91dc979
--- /dev/null
+++ b/04-24.md
@@ -0,0 +1,119 @@
+[\<- 04/22](04-22.md)
+
+---
+
+## Hashing - Min/Max
+
+- How do you find the smallest value in sorted, unsorted, or hash table?
+
+Unsorted Array: O(n)
+
+| | | | | | | |
+|-|-|-|-|-|-|-|
+|3|10|6| | | | |
+
+Sorted Array: O(1)
+
+| | | | | | | |
+|-|-|-|-|-|-|-|
+|3|6|10| | | | |
+
+Hash table: `h(k, i) = (h(k) + i) % m`: O(m) ('m' being the size of the hash table)
+
+| | | | | | | |
+|-|-|-|-|-|-|-|
+| | | |3|10| |6|
+
+## Hashing - DELETION
+
+- Let's consider deleting a value from a table:
+ - let m = 7
+ - let `h(k, i) = (k+i)%m`
+ - insert 4 -> go to index 4
+ - insert 7 -> go to index 0
+ - **delete** 7
+ - like with insertion, find where 7 should go (index 0)
+ - 7 is there, so delete it
+ - -> index 0 cleared
+ - insert 11 -> go to index 4 (collision
+ - i=1 -> go to index 5
+ - **find** 11
+ - find where 11 should be inserted (index 4)
+ - 11 is not there -> i++
+ - 11 should go to index 5
+ - -> 11 is there!
+ - **find** 18
+ - find where 18 should be inserted (index 4)
+ - 18 is not there -> i++
+ - 18 should go to index 5 -> not there -> i++
+ - we stop searching the first time we see an empty slot (in this case, 18 is not in the table)
+ - **delete** 4
+ - find where 4 should be inserted (index 4)
+ - 4 is there -> clear index 4
+ - **find** 11
+ - find where 11 should be inserted (index 4)
+ - index 4 is empty, stop searching
+ - This is bad, because we know that 11 is still in the array (index 5)
+
+- What went wrong? We were fooled by the empty spot
+- You may be tempted to move things up...
+ - It's not that simple
+ - How do you know whether a key is at its home location or is redirected by probing?
+ - What we do is a lot simpler
+- We need to **leave a marker** when a **key is deleted**, so that when we **search for a key**, we know that we **should not stop probing**.
+- Draw two parallel arrays: data & flags
+ - flags: E = empty, F = filled, D = deleted
+
+### Hashing (cont'd)
+
+- Searching using flags
+ - E -> we stop! key is **not** there
+ - F -> we compare
+ - if True -> stop! found it!
+ - if False -> keep going
+ - D -> keep going! (don't compare)
+- So deletions can slow you down. If you are doing a lot of deletions, you're better off using a different style of hash table, which we won't present in this class
+- Where to insert an unfound element?
+ - The first slot marked as empty?
+ - The first slot marked as deleted?
+
+|0|1|2|3|4|5|6|
+|-|-|-|-|-|-|-|
+| | | | |4| | |
+|E|E|E|D|F|E|E|
+
+- Find 10 -> go to index 3 (marked 'D')
+ - i++ -> go to index 4 (marked 'F', but not equal)
+ - i++ -> go to index 5 (marked 'E', so stop)
+- Insert 10 -> to to index 3 (marked 'D')
+ - insert 10 at index 3, mark it as 'F'
+
+- Why?
+ 1. the deleted slot is closer to the home location of that key. Reducing excessive probing cost for searching and insertion
+ 2. We get to reclaim a deleted slot
+
+## Hashing - Insertion & Deletion (Summary)
+
+- When inserting, we still probe as usual, but we remember the first deleted slot we saw
+- If a deleted slot was found, use it over the empty slot
+- Don't forget to **change the flag**
+
+|SET|Unsorted Array|Sorted Array|Hash (Linear Probing)|
+|---|--------------|------------|---------------------|
+|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)|
+|**Add**|O(n) (because a set is unique)|O(n)|O(n) (worst case); O(1) (ideal case)|
+|**Remove**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
+
+---
+
+## Another ADT: BAG
+
+A bag is an **unordered** collection of elements, which are **not necessarily distinct**
+
+|BAG|Unsorted Array|Sorted Array|Hash (Linear Probing)|
+|---|--------------|------------|---------------------|
+|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)|
+|**Add**|O(1)|O(n)|O(n) (worst case); O(1) (ideal case)|
+|**Remove One**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
+|**Remove All**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
+