From 611db5fed241fe3afbb4c36eb5d36ab13538cc58 Mon Sep 17 00:00:00 2001 From: lshprung Date: Fri, 24 Apr 2020 11:10:26 -0700 Subject: Post-class 04/24 --- 04-22.md | 4 +++ 04-24.md | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 123 insertions(+) create mode 100644 04-24.md diff --git a/04-22.md b/04-22.md index 58e0510..8b401f7 100644 --- a/04-22.md +++ b/04-22.md @@ -191,3 +191,7 @@ Visual Representation of the Hash Table: - Why use double hashing? - It avoids secondary clustering - Again, there is no guarantee we will find an empty slot, unless we pick our constants very carefully + +--- + +[04/24 ->](04-24.md) diff --git a/04-24.md b/04-24.md new file mode 100644 index 0000000..91dc979 --- /dev/null +++ b/04-24.md @@ -0,0 +1,119 @@ +[\<- 04/22](04-22.md) + +--- + +## Hashing - Min/Max + +- How do you find the smallest value in sorted, unsorted, or hash table? + +Unsorted Array: O(n) + +| | | | | | | | +|-|-|-|-|-|-|-| +|3|10|6| | | | | + +Sorted Array: O(1) + +| | | | | | | | +|-|-|-|-|-|-|-| +|3|6|10| | | | | + +Hash table: `h(k, i) = (h(k) + i) % m`: O(m) ('m' being the size of the hash table) + +| | | | | | | | +|-|-|-|-|-|-|-| +| | | |3|10| |6| + +## Hashing - DELETION + +- Let's consider deleting a value from a table: + - let m = 7 + - let `h(k, i) = (k+i)%m` + - insert 4 -> go to index 4 + - insert 7 -> go to index 0 + - **delete** 7 + - like with insertion, find where 7 should go (index 0) + - 7 is there, so delete it + - -> index 0 cleared + - insert 11 -> go to index 4 (collision + - i=1 -> go to index 5 + - **find** 11 + - find where 11 should be inserted (index 4) + - 11 is not there -> i++ + - 11 should go to index 5 + - -> 11 is there! + - **find** 18 + - find where 18 should be inserted (index 4) + - 18 is not there -> i++ + - 18 should go to index 5 -> not there -> i++ + - we stop searching the first time we see an empty slot (in this case, 18 is not in the table) + - **delete** 4 + - find where 4 should be inserted (index 4) + - 4 is there -> clear index 4 + - **find** 11 + - find where 11 should be inserted (index 4) + - index 4 is empty, stop searching + - This is bad, because we know that 11 is still in the array (index 5) + +- What went wrong? We were fooled by the empty spot +- You may be tempted to move things up... + - It's not that simple + - How do you know whether a key is at its home location or is redirected by probing? + - What we do is a lot simpler +- We need to **leave a marker** when a **key is deleted**, so that when we **search for a key**, we know that we **should not stop probing**. +- Draw two parallel arrays: data & flags + - flags: E = empty, F = filled, D = deleted + +### Hashing (cont'd) + +- Searching using flags + - E -> we stop! key is **not** there + - F -> we compare + - if True -> stop! found it! + - if False -> keep going + - D -> keep going! (don't compare) +- So deletions can slow you down. If you are doing a lot of deletions, you're better off using a different style of hash table, which we won't present in this class +- Where to insert an unfound element? + - The first slot marked as empty? + - The first slot marked as deleted? + +|0|1|2|3|4|5|6| +|-|-|-|-|-|-|-| +| | | | |4| | | +|E|E|E|D|F|E|E| + +- Find 10 -> go to index 3 (marked 'D') + - i++ -> go to index 4 (marked 'F', but not equal) + - i++ -> go to index 5 (marked 'E', so stop) +- Insert 10 -> to to index 3 (marked 'D') + - insert 10 at index 3, mark it as 'F' + +- Why? + 1. the deleted slot is closer to the home location of that key. Reducing excessive probing cost for searching and insertion + 2. We get to reclaim a deleted slot + +## Hashing - Insertion & Deletion (Summary) + +- When inserting, we still probe as usual, but we remember the first deleted slot we saw +- If a deleted slot was found, use it over the empty slot +- Don't forget to **change the flag** + +|SET|Unsorted Array|Sorted Array|Hash (Linear Probing)| +|---|--------------|------------|---------------------| +|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)| +|**Add**|O(n) (because a set is unique)|O(n)|O(n) (worst case); O(1) (ideal case)| +|**Remove**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)| + +--- + +## Another ADT: BAG + +A bag is an **unordered** collection of elements, which are **not necessarily distinct** + +|BAG|Unsorted Array|Sorted Array|Hash (Linear Probing)| +|---|--------------|------------|---------------------| +|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)| +|**Add**|O(1)|O(n)|O(n) (worst case); O(1) (ideal case)| +|**Remove One**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)| +|**Remove All**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)| + -- cgit