[\<- 04/22](04-22.md) --- ## Hashing - Min/Max - How do you find the smallest value in sorted, unsorted, or hash table? Unsorted Array: O(n) | | | | | | | | |-|-|-|-|-|-|-| |3|10|6| | | | | Sorted Array: O(1) | | | | | | | | |-|-|-|-|-|-|-| |3|6|10| | | | | Hash table: `h(k, i) = (h(k) + i) % m`: O(m) ('m' being the size of the hash table) | | | | | | | | |-|-|-|-|-|-|-| | | | |3|10| |6| ## Hashing - DELETION - Let's consider deleting a value from a table: - let m = 7 - let `h(k, i) = (k+i)%m` - insert 4 -> go to index 4 - insert 7 -> go to index 0 - **delete** 7 - like with insertion, find where 7 should go (index 0) - 7 is there, so delete it - -> index 0 cleared - insert 11 -> go to index 4 (collision - i=1 -> go to index 5 - **find** 11 - find where 11 should be inserted (index 4) - 11 is not there -> i++ - 11 should go to index 5 - -> 11 is there! - **find** 18 - find where 18 should be inserted (index 4) - 18 is not there -> i++ - 18 should go to index 5 -> not there -> i++ - we stop searching the first time we see an empty slot (in this case, 18 is not in the table) - **delete** 4 - find where 4 should be inserted (index 4) - 4 is there -> clear index 4 - **find** 11 - find where 11 should be inserted (index 4) - index 4 is empty, stop searching - This is bad, because we know that 11 is still in the array (index 5) - What went wrong? We were fooled by the empty spot - You may be tempted to move things up... - It's not that simple - How do you know whether a key is at its home location or is redirected by probing? - What we do is a lot simpler - We need to **leave a marker** when a **key is deleted**, so that when we **search for a key**, we know that we **should not stop probing**. - Draw two parallel arrays: data & flags - flags: E = empty, F = filled, D = deleted ### Hashing (cont'd) - Searching using flags - E -> we stop! key is **not** there - F -> we compare - if True -> stop! found it! - if False -> keep going - D -> keep going! (don't compare) - So deletions can slow you down. If you are doing a lot of deletions, you're better off using a different style of hash table, which we won't present in this class - Where to insert an unfound element? - The first slot marked as empty? - The first slot marked as deleted? |0|1|2|3|4|5|6| |-|-|-|-|-|-|-| | | | | |4| | | |E|E|E|D|F|E|E| - Find 10 -> go to index 3 (marked 'D') - i++ -> go to index 4 (marked 'F', but not equal) - i++ -> go to index 5 (marked 'E', so stop) - Insert 10 -> to to index 3 (marked 'D') - insert 10 at index 3, mark it as 'F' - Why? 1. the deleted slot is closer to the home location of that key. Reducing excessive probing cost for searching and insertion 2. We get to reclaim a deleted slot ## Hashing - Insertion & Deletion (Summary) - When inserting, we still probe as usual, but we remember the first deleted slot we saw - If a deleted slot was found, use it over the empty slot - Don't forget to **change the flag** |SET|Unsorted Array|Sorted Array|Hash (Linear Probing)| |---|--------------|------------|---------------------| |**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)| |**Add**|O(n) (because a set is unique)|O(n)|O(n) (worst case); O(1) (ideal case)| |**Remove**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)| --- ## Another ADT: BAG A bag is an **unordered** collection of elements, which are **not necessarily distinct** |BAG|Unsorted Array|Sorted Array|Hash (Linear Probing)| |---|--------------|------------|---------------------| |**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)| |**Add**|O(1)|O(n)|O(n) (worst case); O(1) (ideal case)| |**Remove One**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)| |**Remove All**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|