Post-class 04/13

author: lshprung <lshprung@yahoo.com> 2020-04-13 10:53:41 -0700
committer: lshprung <lshprung@yahoo.com> 2020-04-13 10:53:41 -0700
commit: a573986a27f8fbd8f87a19e7475ea4cc56b632b4 (patch)
tree: c7fa4654d13ee784c386599bc2afac2a034dceb6 /04-13.md
parent: 5b667ae6912559a326ef580a681cc805f556fcaf (diff)
1 files changed, 295 insertions, 0 deletions
diff --git a/04-13.md b/04-13.md
new file mode 100644
index 0000000..937fe4a
--- /dev/null
+++ b/04-13.md
@@ -0,0 +1,295 @@
+[\<- 04/10](04-10.md)
+
+---
+
+## So far ...
+
+We have discussed contents in Chapter 1
+1. Concept of data Structure
+2. Algorithm Efficiency - big O notation
+
+# Search
+
+## First Data Structure
+
+- You've seen it before: an array!
+- How is an array laid out?
+
+|[] |[] |[] |[] |[] |
+|-|-|-|-|-|
+|0|1|...|...|m-1|
+
+- a series of data laid in **consecutive** memory locations
+- picture of slots... starting at "zero"
+- m slots, so the end is at "m-1"
+
+## Arrays - Precondition
+
+- Pre-condition - n existing elements
+	- If we want to store n (0 <= n <= m) elements in the array, we place them in slots "0" through "n-1"
+
+|x|x|x|x|x|[]|[]|[]|
+|-|-|-|-|-|--|--|--|
+|0|1|...|...|n-1|...|...|m-1|
+
+## Arrays - Insert
+
+- Your first operation: insert
+	- It's easiest to add a new element at the end of the array
+
+|x|x|x|x|x|[]|[]|[]|[]|
+|-|-|-|-|-|--|--|--|--|
+|0|1|...|...|n-1|n|...|...|m-1|
+
+- To add the element x, we could do
+
+```
+a[n] = x;
+n++;
+```
+
+Or, alternatively
+```
+a[n++] = x;
+```
+
+- What is the big-O runtime of this operation?
+	- O(1) (no repeating operations)
+
+## Arrays - Access/Retrieve
+
+- Second operation: Access/Retrieve
+	- **Indexing** - Retrieve an element through indexing: O(1)
+	- What about **searching a given value**?
+
+|x|x|x|x|x|[]|[]|[]|
+|-|-|-|-|-|--|--|--|
+|0|1|...|...|n-1|...|...|m-1|
+
+- How to retrieve an element through an index
+```
+x = a[i];
+```
+
+- How to search for a given value?
+	- We'll cover this soon
+
+## Let's Play a Game
+
+Let's look for Matt in this array
+
+|?|?|?|?|?|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+Logically, we would check each index for Matt
+
+- Check the first index (i=0)
+
+|Chris|?|?|?|?|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- Comapare "Chris" to "Matt"
+	- They do not match, so now we move on to the next value
+
+|Chris|Rachael|?|?|?|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- Keep Checking
+
+|Chris|Rachael|Ryan|?|?|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- Keep Checking
+
+|Chris|Rachael|Ryan|Mike|?|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- Keep Checking
+
+|Chris|Rachael|Ryan|Mike|Brian|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- Unsuccessful search - but What's the runtime?
+	- it takes n comparisons -> O(n)
+	- if we were looking for Chris, it would only take one comparison
+		- this is called the best case O(1)
+	- If we were looking for Brian, it would take five comparisons (or n comparisons)
+		- this is called the worst case O(n)
+	- How many comparisons will a successful search require on average? (Keep this question in mind)
+
+## Sequential Search
+
+- Searching an array: start with the first element and proceed to the last element
+- Code:
+
+```
+int found = 0;
+//assume the array 'a' is 'n' elements long
+//assume 'x' is the search term
+
+for(i = 0; i < n; i++){
+	if(a[i] == x){
+		found = 1;
+		break;
+	}
+}
+```
+
+- The break statement is important so that the big-O is O(1) instead of O(n) for best case
+	- More efficient
+- Reminder: to use booleans (found = false) include stdbool.h
+
+## Sequential Search: how to stop
+
+- We can stop after finding first match
+- Two ways to do that
+	- Add a "break" inside the loop when found (see the previous example)
+		- break means exit the innermost loop
+	- make the for loop look like this: `for(i=0; i<n && !found; i++)`
+- You **should NOT** combine them. You **lose points** in clarity if you do that in the assignment. Pick which approach you like, or use a while, but do not use both techniques
+
+---
+
+The Big O notation?
+- It depends on the sequence to search (i.e. input)
+- The best case? O(1)
+	- search is located at the beginning of the array
+- The worst case? O(n)
+	- search is at the end or not in the array
+
+On average, how many tries to find a given item?
+- On average it takes you n/2 tries
+
+Why?
+- In general, if you want an average, you add up everything and divide by the number of something... so search in 1 step, 2 steps,... n steps gives us (1+2+3...+n)/n -> n/2 is a nice shortcut
+
+- So what's the Big-O notation
+	- O(n/2) -> O(n)
+
+## Arrays - Sequential Search - Summary
+
+- This searching algorithm is called "linear search" or "sequential search"
+- It's big-O runtime is O(n)
+	- best case O(1)
+	- average case O(n)
+	- worst case O(n)
+- The military wouldn't be happy if you told them that *on average* the missile defense system stops the missiles
+	- The worst case big-O is the one we care most!
+- Can we improve this?
+
+## Arrays Probability Search
+
+- A game again!!
+
+|Chris|Rachael|Ryan|Mike|Brian|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- How to improve the search algorithm?
+	- Let's say we want to find Mike
+	- We can add a swap to move Mike closer to the beginning in case he is searched for again
+
+|Chris|Rachael|Mike|Ryan|Brian|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- If we look for Mike again, it will take one less comparison to find him
+
+|Chris|Mike|Rachael|Ryan|Brian|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- If we look for Mike a third time, it will take one more less comparison to find him
+
+|Mike|Chris|Rachael|Ryan|Brian|
+|-|-|-|-|-|
+|0|1|2|3|4|
+
+- Now that Mike is at the front, he is the best case
+
+---
+
+- Note
+	- Be careful not to swap the first person with the non-existent preceding person
+- This is called a "move to front heuristic"
+	- Heuristics are things that probably help, but not certain they would
+	- This is in the book called "probability search"
+- Example: having an application in your cache and therefore it runs faster
+- Will this improve unsuccessful searches?
+	- Worst case is still n. The average time probably improves, but is still O(n)
+
+- You will do sequential search in Project 3
+
+## Arrays - Another Search Type
+
+Let's play a game:
+- Pick a number from 1-100
+- Tell a guesser whether he us too high or too low
+- I claim I can guess in 7 or less
+
+1. Guess 50
+	- "50 is too high" -> now we know the number is 1-49
+	- In one guess, the range is cut in half
+
+2. Guess 25
+	- "25 is too low" -> now we know the number is 26-49
+	- In two guesses, the range is cut down to 1/4 of the original range
+
+## Binary Search
+
+- Idea: we halve the space over which we search
+- At each step, we eliminate 1/2 of the remaining possibilities
+- Maximum number of steps required is O(log(n)) if the original size of the space was n
+	- Theoretically faster than sequential search
+- This algorithm is called **binary search**
+- Pre-condition:
+	- It has to be **sorted**, otherwise we couldn't get rid of an entire sub-range
+
+## Binary Search - Code
+
+- side note: be clear on counter/variable names
+
+```
+//return true if 'x' is in 'a[]', false if not; 'n' is the length of 'a[]'
+bool bsearch(int a[], int n, int x){
+
+	int lo, hi, mid;
+	lo = 0;
+	hi = n-1;
+
+	while(lo <= hi){
+		mid = (lo + hi)/2; //updates every iteration
+		if(a[mid] == x) return true;
+		if(a[mid] > x) hi = mid-1;
+		else lo = mid+1;
+	}
+
+	return false;
+
+}
+```
+
+## Binary Search - Big-O
+
+- If you are halving each time, how long does the loop run?
+	- O(log(n))
+	- O(log(n)) beats the pants off linear O(n)
+	- log2(1000000) = 20
+	
+- What is the big-O runtime for binary search?
+	- Best case? O(1)
+	- Worst case? O(log(n))
+	- Of course, we care the most about the worst case
+- If you were working with strings... you'd just use strcmp() instead of comparisons
+
+## Binary Search? Sequential Search?
+
+Is binary search always better than sequential search?
+Answer: It depends
+- How about if the sequence is not sorted?
author	lshprung <lshprung@yahoo.com>	2020-04-13 10:53:41 -0700
committer	lshprung <lshprung@yahoo.com>	2020-04-13 10:53:41 -0700
commit	a573986a27f8fbd8f87a19e7475ea4cc56b632b4 (patch)
tree	c7fa4654d13ee784c386599bc2afac2a034dceb6 /04-13.md
parent	5b667ae6912559a326ef580a681cc805f556fcaf (diff)