From a573986a27f8fbd8f87a19e7475ea4cc56b632b4 Mon Sep 17 00:00:00 2001 From: lshprung Date: Mon, 13 Apr 2020 10:53:41 -0700 Subject: Post-class 04/13 --- 04-13.md | 295 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 295 insertions(+) create mode 100644 04-13.md (limited to '04-13.md') diff --git a/04-13.md b/04-13.md new file mode 100644 index 0000000..937fe4a --- /dev/null +++ b/04-13.md @@ -0,0 +1,295 @@ +[\<- 04/10](04-10.md) + +--- + +## So far ... + +We have discussed contents in Chapter 1 +1. Concept of data Structure +2. Algorithm Efficiency - big O notation + +# Search + +## First Data Structure + +- You've seen it before: an array! +- How is an array laid out? + +|[] |[] |[] |[] |[] | +|-|-|-|-|-| +|0|1|...|...|m-1| + +- a series of data laid in **consecutive** memory locations +- picture of slots... starting at "zero" +- m slots, so the end is at "m-1" + +## Arrays - Precondition + +- Pre-condition - n existing elements + - If we want to store n (0 <= n <= m) elements in the array, we place them in slots "0" through "n-1" + +|x|x|x|x|x|[]|[]|[]| +|-|-|-|-|-|--|--|--| +|0|1|...|...|n-1|...|...|m-1| + +## Arrays - Insert + +- Your first operation: insert + - It's easiest to add a new element at the end of the array + +|x|x|x|x|x|[]|[]|[]|[]| +|-|-|-|-|-|--|--|--|--| +|0|1|...|...|n-1|n|...|...|m-1| + +- To add the element x, we could do + +``` +a[n] = x; +n++; +``` + +Or, alternatively +``` +a[n++] = x; +``` + +- What is the big-O runtime of this operation? + - O(1) (no repeating operations) + +## Arrays - Access/Retrieve + +- Second operation: Access/Retrieve + - **Indexing** - Retrieve an element through indexing: O(1) + - What about **searching a given value**? + +|x|x|x|x|x|[]|[]|[]| +|-|-|-|-|-|--|--|--| +|0|1|...|...|n-1|...|...|m-1| + +- How to retrieve an element through an index +``` +x = a[i]; +``` + +- How to search for a given value? + - We'll cover this soon + +## Let's Play a Game + +Let's look for Matt in this array + +|?|?|?|?|?| +|-|-|-|-|-| +|0|1|2|3|4| + +Logically, we would check each index for Matt + +- Check the first index (i=0) + +|Chris|?|?|?|?| +|-|-|-|-|-| +|0|1|2|3|4| + +- Comapare "Chris" to "Matt" + - They do not match, so now we move on to the next value + +|Chris|Rachael|?|?|?| +|-|-|-|-|-| +|0|1|2|3|4| + +- Keep Checking + +|Chris|Rachael|Ryan|?|?| +|-|-|-|-|-| +|0|1|2|3|4| + +- Keep Checking + +|Chris|Rachael|Ryan|Mike|?| +|-|-|-|-|-| +|0|1|2|3|4| + +- Keep Checking + +|Chris|Rachael|Ryan|Mike|Brian| +|-|-|-|-|-| +|0|1|2|3|4| + +- Unsuccessful search - but What's the runtime? + - it takes n comparisons -> O(n) + - if we were looking for Chris, it would only take one comparison + - this is called the best case O(1) + - If we were looking for Brian, it would take five comparisons (or n comparisons) + - this is called the worst case O(n) + - How many comparisons will a successful search require on average? (Keep this question in mind) + +## Sequential Search + +- Searching an array: start with the first element and proceed to the last element +- Code: + +``` +int found = 0; +//assume the array 'a' is 'n' elements long +//assume 'x' is the search term + +for(i = 0; i < n; i++){ + if(a[i] == x){ + found = 1; + break; + } +} +``` + +- The break statement is important so that the big-O is O(1) instead of O(n) for best case + - More efficient +- Reminder: to use booleans (found = false) include stdbool.h + +## Sequential Search: how to stop + +- We can stop after finding first match +- Two ways to do that + - Add a "break" inside the loop when found (see the previous example) + - break means exit the innermost loop + - make the for loop look like this: `for(i=0; i n/2 is a nice shortcut + +- So what's the Big-O notation + - O(n/2) -> O(n) + +## Arrays - Sequential Search - Summary + +- This searching algorithm is called "linear search" or "sequential search" +- It's big-O runtime is O(n) + - best case O(1) + - average case O(n) + - worst case O(n) +- The military wouldn't be happy if you told them that *on average* the missile defense system stops the missiles + - The worst case big-O is the one we care most! +- Can we improve this? + +## Arrays Probability Search + +- A game again!! + +|Chris|Rachael|Ryan|Mike|Brian| +|-|-|-|-|-| +|0|1|2|3|4| + +- How to improve the search algorithm? + - Let's say we want to find Mike + - We can add a swap to move Mike closer to the beginning in case he is searched for again + +|Chris|Rachael|Mike|Ryan|Brian| +|-|-|-|-|-| +|0|1|2|3|4| + +- If we look for Mike again, it will take one less comparison to find him + +|Chris|Mike|Rachael|Ryan|Brian| +|-|-|-|-|-| +|0|1|2|3|4| + +- If we look for Mike a third time, it will take one more less comparison to find him + +|Mike|Chris|Rachael|Ryan|Brian| +|-|-|-|-|-| +|0|1|2|3|4| + +- Now that Mike is at the front, he is the best case + +--- + +- Note + - Be careful not to swap the first person with the non-existent preceding person +- This is called a "move to front heuristic" + - Heuristics are things that probably help, but not certain they would + - This is in the book called "probability search" +- Example: having an application in your cache and therefore it runs faster +- Will this improve unsuccessful searches? + - Worst case is still n. The average time probably improves, but is still O(n) + +- You will do sequential search in Project 3 + +## Arrays - Another Search Type + +Let's play a game: +- Pick a number from 1-100 +- Tell a guesser whether he us too high or too low +- I claim I can guess in 7 or less + +1. Guess 50 + - "50 is too high" -> now we know the number is 1-49 + - In one guess, the range is cut in half + +2. Guess 25 + - "25 is too low" -> now we know the number is 26-49 + - In two guesses, the range is cut down to 1/4 of the original range + +## Binary Search + +- Idea: we halve the space over which we search +- At each step, we eliminate 1/2 of the remaining possibilities +- Maximum number of steps required is O(log(n)) if the original size of the space was n + - Theoretically faster than sequential search +- This algorithm is called **binary search** +- Pre-condition: + - It has to be **sorted**, otherwise we couldn't get rid of an entire sub-range + +## Binary Search - Code + +- side note: be clear on counter/variable names + +``` +//return true if 'x' is in 'a[]', false if not; 'n' is the length of 'a[]' +bool bsearch(int a[], int n, int x){ + + int lo, hi, mid; + lo = 0; + hi = n-1; + + while(lo <= hi){ + mid = (lo + hi)/2; //updates every iteration + if(a[mid] == x) return true; + if(a[mid] > x) hi = mid-1; + else lo = mid+1; + } + + return false; + +} +``` + +## Binary Search - Big-O + +- If you are halving each time, how long does the loop run? + - O(log(n)) + - O(log(n)) beats the pants off linear O(n) + - log2(1000000) = 20 + +- What is the big-O runtime for binary search? + - Best case? O(1) + - Worst case? O(log(n)) + - Of course, we care the most about the worst case +- If you were working with strings... you'd just use strcmp() instead of comparisons + +## Binary Search? Sequential Search? + +Is binary search always better than sequential search? +Answer: It depends +- How about if the sequence is not sorted? -- cgit