summaryrefslogtreecommitdiff
path: root/04-24.md
blob: 2886e2e564cdceb8dcd0c703ed8936c55c16d69a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
[\<- 04/22](04-22.md)

---

## Hashing - Min/Max

- How do you find the smallest value in sorted, unsorted, or hash table?

Unsorted Array: O(n)

| | | | | | | |
|-|-|-|-|-|-|-|
|3|10|6| | | | |

Sorted Array: O(1)

| | | | | | | |
|-|-|-|-|-|-|-|
|3|6|10| | | | |

Hash table: `h(k, i) = (h(k) + i) % m`: O(m) ('m' being the size of the hash table)

| | | | | | | |
|-|-|-|-|-|-|-|
| | | |3|10| |6|

## Hashing - DELETION

- Let's consider deleting a value from a table:
	- let m = 7
	- let `h(k, i) = (k+i)%m`
	- insert 4 -> go to index 4
	- insert 7 -> go to index 0
	- **delete** 7
		- like with insertion, find where 7 should go (index 0)
		- 7 is there, so delete it
		- -> index 0 cleared
	- insert 11 -> go to index 4 (collision
		- i=1 -> go to index 5
	- **find** 11
		- find where 11 should be inserted (index 4)
		- 11 is not there -> i++
		- 11 should go to index 5
		- -> 11 is there!
	- **find** 18
		- find where 18 should be inserted (index 4)
		- 18 is not there -> i++
		- 18 should go to index 5 -> not there -> i++
		- we stop searching the first time we see an empty slot (in this case, 18 is not in the table)
	- **delete** 4
		- find where 4 should be inserted (index 4)
		- 4 is there -> clear index 4
	- **find** 11
		- find where 11 should be inserted (index 4)
		- index 4 is empty, stop searching
		- This is bad, because we know that 11 is still in the array (index 5)

- What went wrong? We were fooled by the empty spot
- You may be tempted to move things up...
	- It's not that simple
		- How do you know whether a key is at its home location or is redirected by probing?
	- What we do is a lot simpler
- We need to **leave a marker** when a **key is deleted**, so that when we **search for a key**, we know that we **should not stop probing**.
- Draw two parallel arrays: data & flags
	- flags: E = empty, F = filled, D = deleted

### Hashing (cont'd)

- Searching using flags
	- E -> we stop! key is **not** there
	- F -> we compare
		- if True -> stop! found it!
		- if False -> keep going
	- D -> keep going! (don't compare)
- So deletions can slow you down. If you are doing a lot of deletions, you're better off using a different style of hash table, which we won't present in this class
- Where to insert an unfound element?
	- The first slot marked as empty?
	- The first slot marked as deleted?

|0|1|2|3|4|5|6|
|-|-|-|-|-|-|-|
| | | | |4| | |
|E|E|E|D|F|E|E|

- Find 10 -> go to index 3 (marked 'D')
	- i++ -> go to index 4 (marked 'F', but not equal)
	- i++ -> go to index 5 (marked 'E', so stop)
- Insert 10 -> to to index 3 (marked 'D')
	- insert 10 at index 3, mark it as 'F'

- Why?
	1. the deleted slot is closer to the home location of that key. Reducing excessive probing cost for searching and insertion
	2. We get to reclaim a deleted slot

## Hashing - Insertion & Deletion (Summary)

- When inserting, we still probe as usual, but we remember the first deleted slot we saw
- If a deleted slot was found, use it over the empty slot
- Don't forget to **change the flag**

|SET|Unsorted Array|Sorted Array|Hash (Linear Probing)|
|---|--------------|------------|---------------------|
|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)|
|**Add**|O(n) (because a set is unique)|O(n)|O(n) (worst case); O(1) (ideal case)|
|**Remove**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|

---

## Another ADT: BAG

A bag is an **unordered** collection of elements, which are **not necessarily distinct**

|BAG|Unsorted Array|Sorted Array|Hash (Linear Probing)|
|---|--------------|------------|---------------------|
|**Search/Find**|O(n)|O(log(n))|O(n) (worst case); O(1) (ideal case)|
|**Add**|O(1)|O(n)|O(n) (worst case); O(1) (ideal case)|
|**Remove One**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|
|**Remove All**|O(n)|O(n)|O(n) (worst case); O(1) (ideal case)|

---

[04/27 ->](04-27.md)