summaryrefslogtreecommitdiff
path: root/04-13.md
blob: b4ae755876792b76de265f6391df7f9adb6d0055 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
[\<- 04/10](04-10.md)

---

## So far ...

We have discussed contents in Chapter 1
1. Concept of data Structure
2. Algorithm Efficiency - big O notation

# Search

## First Data Structure

- You've seen it before: an array!
- How is an array laid out?

|[] |[] |[] |[] |[] |
|-|-|-|-|-|
|0|1|...|...|m-1|

- a series of data laid in **consecutive** memory locations
- picture of slots... starting at "zero"
- m slots, so the end is at "m-1"

## Arrays - Precondition

- Pre-condition - n existing elements
	- If we want to store n (0 <= n <= m) elements in the array, we place them in slots "0" through "n-1"

|x|x|x|x|x|[]|[]|[]|
|-|-|-|-|-|--|--|--|
|0|1|...|...|n-1|...|...|m-1|

## Arrays - Insert

- Your first operation: insert
	- It's easiest to add a new element at the end of the array

|x|x|x|x|x|[]|[]|[]|[]|
|-|-|-|-|-|--|--|--|--|
|0|1|...|...|n-1|n|...|...|m-1|

- To add the element x, we could do

```
a[n] = x;
n++;
```

Or, alternatively
```
a[n++] = x;
```

- What is the big-O runtime of this operation?
	- O(1) (no repeating operations)

## Arrays - Access/Retrieve

- Second operation: Access/Retrieve
	- **Indexing** - Retrieve an element through indexing: O(1)
	- What about **searching a given value**?

|x|x|x|x|x|[]|[]|[]|
|-|-|-|-|-|--|--|--|
|0|1|...|...|n-1|...|...|m-1|

- How to retrieve an element through an index
```
x = a[i];
```

- How to search for a given value?
	- We'll cover this soon

## Let's Play a Game

Let's look for Matt in this array

|?|?|?|?|?|
|-|-|-|-|-|
|0|1|2|3|4|

Logically, we would check each index for Matt

- Check the first index (i=0)

|Chris|?|?|?|?|
|-|-|-|-|-|
|0|1|2|3|4|

- Comapare "Chris" to "Matt"
	- They do not match, so now we move on to the next value

|Chris|Rachael|?|?|?|
|-|-|-|-|-|
|0|1|2|3|4|

- Keep Checking

|Chris|Rachael|Ryan|?|?|
|-|-|-|-|-|
|0|1|2|3|4|

- Keep Checking

|Chris|Rachael|Ryan|Mike|?|
|-|-|-|-|-|
|0|1|2|3|4|

- Keep Checking

|Chris|Rachael|Ryan|Mike|Brian|
|-|-|-|-|-|
|0|1|2|3|4|

- Unsuccessful search - but What's the runtime?
	- it takes n comparisons -> O(n)
	- if we were looking for Chris, it would only take one comparison
		- this is called the best case O(1)
	- If we were looking for Brian, it would take five comparisons (or n comparisons)
		- this is called the worst case O(n)
	- How many comparisons will a successful search require on average? (Keep this question in mind)

## Sequential Search

- Searching an array: start with the first element and proceed to the last element
- Code:

```
int found = 0;
//assume the array 'a' is 'n' elements long
//assume 'x' is the search term

for(i = 0; i < n; i++){
	if(a[i] == x){
		found = 1;
		break;
	}
}
```

- The break statement is important so that the big-O is O(1) instead of O(n) for best case
	- More efficient
- Reminder: to use booleans (found = false) include stdbool.h

## Sequential Search: how to stop

- We can stop after finding first match
- Two ways to do that
	- Add a "break" inside the loop when found (see the previous example)
		- break means exit the innermost loop
	- make the for loop look like this: `for(i=0; i<n && !found; i++)`
- You **should NOT** combine them. You **lose points** in clarity if you do that in the assignment. Pick which approach you like, or use a while, but do not use both techniques

---

The Big O notation?
- It depends on the sequence to search (i.e. input)
- The best case? O(1)
	- search is located at the beginning of the array
- The worst case? O(n)
	- search is at the end or not in the array

On average, how many tries to find a given item?
- On average it takes you n/2 tries

Why?
- In general, if you want an average, you add up everything and divide by the number of something... so search in 1 step, 2 steps,... n steps gives us (1+2+3...+n)/n -> n/2 is a nice shortcut

- So what's the Big-O notation
	- O(n/2) -> O(n)

## Arrays - Sequential Search - Summary

- This searching algorithm is called "linear search" or "sequential search"
- It's big-O runtime is O(n)
	- best case O(1)
	- average case O(n)
	- worst case O(n)
- The military wouldn't be happy if you told them that *on average* the missile defense system stops the missiles
	- The worst case big-O is the one we care most!
- Can we improve this?

## Arrays Probability Search

- A game again!!

|Chris|Rachael|Ryan|Mike|Brian|
|-|-|-|-|-|
|0|1|2|3|4|

- How to improve the search algorithm?
	- Let's say we want to find Mike
	- We can add a swap to move Mike closer to the beginning in case he is searched for again

|Chris|Rachael|Mike|Ryan|Brian|
|-|-|-|-|-|
|0|1|2|3|4|

- If we look for Mike again, it will take one less comparison to find him

|Chris|Mike|Rachael|Ryan|Brian|
|-|-|-|-|-|
|0|1|2|3|4|

- If we look for Mike a third time, it will take one more less comparison to find him

|Mike|Chris|Rachael|Ryan|Brian|
|-|-|-|-|-|
|0|1|2|3|4|

- Now that Mike is at the front, he is the best case

---

- Note
	- Be careful not to swap the first person with the non-existent preceding person
- This is called a "move to front heuristic"
	- Heuristics are things that probably help, but not certain they would
	- This is in the book called "probability search"
- Example: having an application in your cache and therefore it runs faster
- Will this improve unsuccessful searches?
	- Worst case is still n. The average time probably improves, but is still O(n)

- You will do sequential search in Project 3

## Arrays - Another Search Type

Let's play a game:
- Pick a number from 1-100
- Tell a guesser whether he us too high or too low
- I claim I can guess in 7 or less

1. Guess 50
	- "50 is too high" -> now we know the number is 1-49
	- In one guess, the range is cut in half

2. Guess 25
	- "25 is too low" -> now we know the number is 26-49
	- In two guesses, the range is cut down to 1/4 of the original range

## Binary Search

- Idea: we halve the space over which we search
- At each step, we eliminate 1/2 of the remaining possibilities
- Maximum number of steps required is O(log(n)) if the original size of the space was n
	- Theoretically faster than sequential search
- This algorithm is called **binary search**
- Pre-condition:
	- It has to be **sorted**, otherwise we couldn't get rid of an entire sub-range

## Binary Search - Code

- side note: be clear on counter/variable names

```
//return true if 'x' is in 'a[]', false if not; 'n' is the length of 'a[]'
bool bsearch(int a[], int n, int x){

	int lo, hi, mid;
	lo = 0;
	hi = n-1;

	while(lo <= hi){
		mid = (lo + hi)/2; //updates every iteration
		if(a[mid] == x) return true;
		if(a[mid] > x) hi = mid-1;
		else lo = mid+1;
	}

	return false;

}
```

## Binary Search - Big-O

- If you are halving each time, how long does the loop run?
	- O(log(n))
	- O(log(n)) beats the pants off linear O(n)
	- log2(1000000) = 20
	
- What is the big-O runtime for binary search?
	- Best case? O(1)
	- Worst case? O(log(n))
	- Of course, we care the most about the worst case
- If you were working with strings... you'd just use strcmp() instead of comparisons

## Binary Search? Sequential Search?

Is binary search always better than sequential search?
Answer: It depends
- How about if the sequence is not sorted?

---

[04/15 ->](04-15.md)