[\<- 04/10](04-10.md) --- ## So far ... We have discussed contents in Chapter 1 1. Concept of data Structure 2. Algorithm Efficiency - big O notation # Search ## First Data Structure - You've seen it before: an array! - How is an array laid out? |[] |[] |[] |[] |[] | |-|-|-|-|-| |0|1|...|...|m-1| - a series of data laid in **consecutive** memory locations - picture of slots... starting at "zero" - m slots, so the end is at "m-1" ## Arrays - Precondition - Pre-condition - n existing elements - If we want to store n (0 <= n <= m) elements in the array, we place them in slots "0" through "n-1" |x|x|x|x|x|[]|[]|[]| |-|-|-|-|-|--|--|--| |0|1|...|...|n-1|...|...|m-1| ## Arrays - Insert - Your first operation: insert - It's easiest to add a new element at the end of the array |x|x|x|x|x|[]|[]|[]|[]| |-|-|-|-|-|--|--|--|--| |0|1|...|...|n-1|n|...|...|m-1| - To add the element x, we could do ``` a[n] = x; n++; ``` Or, alternatively ``` a[n++] = x; ``` - What is the big-O runtime of this operation? - O(1) (no repeating operations) ## Arrays - Access/Retrieve - Second operation: Access/Retrieve - **Indexing** - Retrieve an element through indexing: O(1) - What about **searching a given value**? |x|x|x|x|x|[]|[]|[]| |-|-|-|-|-|--|--|--| |0|1|...|...|n-1|...|...|m-1| - How to retrieve an element through an index ``` x = a[i]; ``` - How to search for a given value? - We'll cover this soon ## Let's Play a Game Let's look for Matt in this array |?|?|?|?|?| |-|-|-|-|-| |0|1|2|3|4| Logically, we would check each index for Matt - Check the first index (i=0) |Chris|?|?|?|?| |-|-|-|-|-| |0|1|2|3|4| - Comapare "Chris" to "Matt" - They do not match, so now we move on to the next value |Chris|Rachael|?|?|?| |-|-|-|-|-| |0|1|2|3|4| - Keep Checking |Chris|Rachael|Ryan|?|?| |-|-|-|-|-| |0|1|2|3|4| - Keep Checking |Chris|Rachael|Ryan|Mike|?| |-|-|-|-|-| |0|1|2|3|4| - Keep Checking |Chris|Rachael|Ryan|Mike|Brian| |-|-|-|-|-| |0|1|2|3|4| - Unsuccessful search - but What's the runtime? - it takes n comparisons -> O(n) - if we were looking for Chris, it would only take one comparison - this is called the best case O(1) - If we were looking for Brian, it would take five comparisons (or n comparisons) - this is called the worst case O(n) - How many comparisons will a successful search require on average? (Keep this question in mind) ## Sequential Search - Searching an array: start with the first element and proceed to the last element - Code: ``` int found = 0; //assume the array 'a' is 'n' elements long //assume 'x' is the search term for(i = 0; i < n; i++){ if(a[i] == x){ found = 1; break; } } ``` - The break statement is important so that the big-O is O(1) instead of O(n) for best case - More efficient - Reminder: to use booleans (found = false) include stdbool.h ## Sequential Search: how to stop - We can stop after finding first match - Two ways to do that - Add a "break" inside the loop when found (see the previous example) - break means exit the innermost loop - make the for loop look like this: `for(i=0; i n/2 is a nice shortcut - So what's the Big-O notation - O(n/2) -> O(n) ## Arrays - Sequential Search - Summary - This searching algorithm is called "linear search" or "sequential search" - It's big-O runtime is O(n) - best case O(1) - average case O(n) - worst case O(n) - The military wouldn't be happy if you told them that *on average* the missile defense system stops the missiles - The worst case big-O is the one we care most! - Can we improve this? ## Arrays Probability Search - A game again!! |Chris|Rachael|Ryan|Mike|Brian| |-|-|-|-|-| |0|1|2|3|4| - How to improve the search algorithm? - Let's say we want to find Mike - We can add a swap to move Mike closer to the beginning in case he is searched for again |Chris|Rachael|Mike|Ryan|Brian| |-|-|-|-|-| |0|1|2|3|4| - If we look for Mike again, it will take one less comparison to find him |Chris|Mike|Rachael|Ryan|Brian| |-|-|-|-|-| |0|1|2|3|4| - If we look for Mike a third time, it will take one more less comparison to find him |Mike|Chris|Rachael|Ryan|Brian| |-|-|-|-|-| |0|1|2|3|4| - Now that Mike is at the front, he is the best case --- - Note - Be careful not to swap the first person with the non-existent preceding person - This is called a "move to front heuristic" - Heuristics are things that probably help, but not certain they would - This is in the book called "probability search" - Example: having an application in your cache and therefore it runs faster - Will this improve unsuccessful searches? - Worst case is still n. The average time probably improves, but is still O(n) - You will do sequential search in Project 3 ## Arrays - Another Search Type Let's play a game: - Pick a number from 1-100 - Tell a guesser whether he us too high or too low - I claim I can guess in 7 or less 1. Guess 50 - "50 is too high" -> now we know the number is 1-49 - In one guess, the range is cut in half 2. Guess 25 - "25 is too low" -> now we know the number is 26-49 - In two guesses, the range is cut down to 1/4 of the original range ## Binary Search - Idea: we halve the space over which we search - At each step, we eliminate 1/2 of the remaining possibilities - Maximum number of steps required is O(log(n)) if the original size of the space was n - Theoretically faster than sequential search - This algorithm is called **binary search** - Pre-condition: - It has to be **sorted**, otherwise we couldn't get rid of an entire sub-range ## Binary Search - Code - side note: be clear on counter/variable names ``` //return true if 'x' is in 'a[]', false if not; 'n' is the length of 'a[]' bool bsearch(int a[], int n, int x){ int lo, hi, mid; lo = 0; hi = n-1; while(lo <= hi){ mid = (lo + hi)/2; //updates every iteration if(a[mid] == x) return true; if(a[mid] > x) hi = mid-1; else lo = mid+1; } return false; } ``` ## Binary Search - Big-O - If you are halving each time, how long does the loop run? - O(log(n)) - O(log(n)) beats the pants off linear O(n) - log2(1000000) = 20 - What is the big-O runtime for binary search? - Best case? O(1) - Worst case? O(log(n)) - Of course, we care the most about the worst case - If you were working with strings... you'd just use strcmp() instead of comparisons ## Binary Search? Sequential Search? Is binary search always better than sequential search? Answer: It depends - How about if the sequence is not sorted? --- [04/15 ->](04-15.md)