1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
|
[\<- 04/22](04-22.md)
---
## Hashing - Min/Max
- How do you find the smallest value in sorted, unsorted, or hash table?
Unsorted Array: O(n)
| | | | | | | |
|-|-|-|-|-|-|-|
|3|10|6| | | | |
Sorted Array: O(1)
| | | | | | | |
|-|-|-|-|-|-|-|
|3|6|10| | | | |
Hash table: `h(k, i) = (h(k) + i) % m`: O(m) ('m' being the size of the hash table)
| | | | | | | |
|-|-|-|-|-|-|-|
| | | |3|10| |6|
## Hashing - DELETION
- Let's consider deleting a value from a table:
- let m = 7
- let `h(k, i) = (k+i)%m`
- insert 4 -> go to index 4
- insert 7 -> go to index 0
- **delete** 7
- like with insertion, find where 7 should go (index 0)
- 7 is there, so delete it
- -> index 0 cleared
- insert 11 -> go to index 4 (collision
- i=1 -> go to index 5
- **find** 11
- find where 11 should be inserted (index 4)
- 11 is not there -> i++
- 11 should go to index 5
- -> 11 is there!
- **find** 18
- find where 18 should be inserted (index 4)
- 18 is not there -> i++
- 18 should go to index 5 -> not there -> i++
- we stop searching the first time we see an empty slot (in this case, 18 is not in the table)
- **delete** 4
- find where 4 should be inserted (index 4)
- 4 is there -> clear index 4
- **find** 11
- find where 11 should be inserted (index 4)
- index 4 is empty, stop searching
- This is bad, because we know that 11 is still in the array (index 5)
- What went wrong? We were fooled by the empty spot
- You may be tempted to move things up...
- It's not that simple
- How do you know whether a key is at its home location or is redirected by probing?
- What we do is a lot simpler
- We need to **leave a marker** when a **key is deleted**, so that when we **search for a key**, we know that we **should not stop probing**.
- Draw two parallel arrays: data & flags
- flags: E = empty, F = filled, D = deleted
### Hashing (cont'd)
- Searching using flags
- E -> we stop! key is **not** there
- F -> we compare
- if True -> stop! found it!
- if False -> keep going
- D -> keep going! (don't compare)
- So deletions can slow you down. If you are doing a lot of deletions, you're better off using a different style of hash table, which we won't present in this class
- Where to insert an unfound element?
- The first slot marked as empty?
- The first slot marked as deleted?
|0|1|2|3|4|5|6|
|-|-|-|-|-|-|-|
| | | | |4| | |
|E|E|E|D|F|E|E|
- Find 10 -> go to index 3 (marked 'D')
- i++ -> go to index 4 (marked 'F', but not equal)
- i++ -> go to index 5 (marked 'E', so stop)
- Insert 10 -> to to index 3 (marked 'D')
- insert 10 at index 3, mark it as 'F'
- Why?
1. the deleted slot is closer to the home location of that key. Reducing excessive probing cost for searching and insertion
2. We get to reclaim a deleted slot
## Hashing - Insertion & Deletion (Summary)
- When inserting, we still probe as usual, but we remember the first deleted slot we saw
- If a deleted slot was found, use it over the empty slot
- Don't forget to **change the flag**
|SET|Unsorted Array|Sorted Array|Hash (Linear Probing)|
|---|--------------|------------|---------------------|
|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)|
|**Add**|O(n) (because a set is unique)|O(n)|O(n) (worst case); O(1) (ideal case)|
|**Remove**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
---
## Another ADT: BAG
A bag is an **unordered** collection of elements, which are **not necessarily distinct**
|BAG|Unsorted Array|Sorted Array|Hash (Linear Probing)|
|---|--------------|------------|---------------------|
|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)|
|**Add**|O(1)|O(n)|O(n) (worst case); O(1) (ideal case)|
|**Remove One**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
|**Remove All**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
---
[04/27 ->](04-27.md)
|