From 0d3cf87155206bcf4ac44d42d1ca5a5ef12c0642 Mon Sep 17 00:00:00 2001 From: lshprung Date: Fri, 5 Jun 2020 10:52:22 -0700 Subject: Post-class 06/05 --- 06-05.md | 369 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 369 insertions(+) create mode 100644 06-05.md (limited to '06-05.md') diff --git a/06-05.md b/06-05.md new file mode 100644 index 0000000..2c12d86 --- /dev/null +++ b/06-05.md @@ -0,0 +1,369 @@ +[\<- 06/03](06-03.md) + +--- + +# Quick Sort and Merge Sort + +- Last two sorting algorithms: quick sort and merge sort +- Both are examples of a divide and conquer algorithm (more properly called divide, conquer, and unite) + 1. Divide: Split a problem into subproblems + 2. Conquer: Recursively solve each subproblem + 3. Unite: Combine the answers from the subproblems + +- Typically, either the divide or unite step is trivial + - If you're doing a lot of work to divide the problem and also a lot of work to combine the results, you've chosen a wrong approach! + +## Comparison + +- Quick sort is an efficient exchange-based sort + - The unite step is trivial: it's actually missing! + - The hard work is done in the divide step + +- Merge sort falls into the "other" category + - The divide step is trivial: it's just one line of code + - The hard work is done in the unite step + +- Both are **recursive** sorting algorithms + +## Recursive Array Algorithms + +- A typical array algorithm is passed the array and its size + - `void insertionSort(int a[], int n);` + +- Recursive algorithms on arrays operate on **subarrays** +- They have to be passed the lower bound and upper bound of the subarray + - `void bsearch(int a[], int lo, int hi, int x);` + +- The lower and upper bounds are **inclusive** + - `bsearch(a, 0, n-1, x);` + +## Review: Binary Search + +``` +bool bsearch(int a[], int lo, int hi, int x){ + int mid; + + if(lo > hi) return false; + + mid = (lo + hi)/2; + + if(x == a[mid]) return true; + else if(x < a[mid]) return bsearch(a, lo, mid-1, x); + else return bsearch(a, mid+1, hi, x); +} +``` + +# Quick Sort + +- How it works + - Each time, selects an element, known as **pivot** + - Reorder the array so that all elements with values **less** than the pivot **come before** the pivot, while all elements with values **greater** than the pivot **come after it** (equal values can go either way). After this partitioning, **the pivot is in its final position**. This is called the partition operation + - **Recursively** apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values + +## Code + +``` +void quicksort(int a[], int lo, int hi){ + if(hi > lo){ + int p = partition(a, lo, hi); //divide + quicksort(a, lo, p-1); //conquer left half + quicksort(a, p+1, hi); //conquer right half + } +} +``` + +## Demonstration + +![visualization of pivot](06-05_img1.png) + +## Selection of Pivot + +- It's important - affect Big O +- How + - Select the leftmost one or rightmost one + - find the median: + +|78|21|14|97|87|62|74|85|76|45|84|22| +|--|--|--|--|--|--|--|--|--|--|--|--| + +- The leftmost and rightmost elements are pretty skewed... + - Solution: do 3 comparisons +- Compare the leftmost value and the middle value. If they are out of order, swap them + +|62|21|14|97|87|78|74|85|76|45|84|22| +|--|--|--|--|--|--|--|--|--|--|--|--| + +- Compare leftmost and rightmost value. If they are out of order, swap them + +|22|21|14|97|87|78|74|85|76|45|84|62| +|--|--|--|--|--|--|--|--|--|--|--|--| + +- Compare middle value and rightmost value. If they are out of order, swap them + +|22|21|14|97|87|62|74|85|76|45|84|78| +|--|--|--|--|--|--|--|--|--|--|--|--| + +- Now we have a smaller value in the leftmost, a middling value in the middle, and a larger value in the rightmost + +## Lomuto's Partition Algorithm + +![visualization](06-05_img2.png) + +- Two magic walls +- Move second wall **from left to right**: Looking for the elements less than the pivot +- When found, swap with value at first wall, and advance first wall +- When all done, place the pivot in the middle + +### Code + +``` +int partition(int a[], int lo, int hi){ + p = a[hi]; //last value is pivot + sep = lo; + + for(i = lo, i more exchanges are required + - In quick sort, a typical exchange involves elements that are **far apart** => fewer exchanges are required to correctly position an element + +## Analysis: Best Case + +- Assume we always find the median as the pivot +- An array of size n always splits into two arrays of size n/2 + - At level 1, we do n steps in partition + - At level 2, we do 2 * (n/2) steps (2 partitions of size n/2) + - At level 3, we do 4 * (n/4) steps (4 partitions of size n/4) + - ... + +- How many levels are there? O(log(n)) +- Total amount of work is O(nlog(n)) + +## Analysis: Worst Case + +- Assume we always find the largest value as the pivot +- An array of size n always splits into an array of size n-1, and another array of size 0 + - At level 1, we do n steps in partition + - At level 2, we do n-1 steps (1 partition of size n-1) + - At level 3, we do n-3 steps (1 partition of size n-2) + - ... + +- Total amount of work is n + (n-1) + (n-2) + ... + 1 = O(n^2) + +- Big-O analysis + - Best case? - O(nlog(n)) + - Average case? = O(nlog(n)) + - Worst case? - O(n^2) + +- Stability? + - Consider sequence 5 6 **6** 3 2 + - not stable + +### Space Overhead + +- Space overhead? O(1) O(log(n))? O(n)? + - Each recursive call has O(1) space overhead + +- What is the average depth of recursion? O(log(n)) +- What is the worst case depth of recursion? O(n) + - But if we are clever, we can reduce that + - Recurse on the smaller sublist, then loop back and do the larger list + +- So, the space overhead in the worst case is O(log(n)) + +# Merge Sort + +- How it works + - Split the array in half. Unlike quick sort we always split the array exactly in half + - **Recursively** sort each half. An array of size 0 or 1 is already sorted + - **Merge** each half linear time + +## Merging + +- Consider two sorted arrays, a and b +- Keep an index i and j into each array +- Compare a]i\ and b]j\ +- Place the smaller in the output and advance the index +- If you reach the end of either array, copy the other array's contents into the output +- Runtime is O(length of a + length of b) + +### Merging Example + +- Merge 1, 3, 5, 6 with 2, 4, 5, 8 + - 1 2 3 4 5 5 6 8 + - Merging is O(n) + +## Code + +``` +void mergesort(int a[], int lo, int hi){ + if(hi > lo){ + int mid = (lo + hi)/2; //divide + mergesort(a, lo, mid-1); //conquer left half + mergesort(a, mid, hi); //conquer right half + merge(a, lo, mid-1, mid, hi); //unite + } +} +``` + +## Example + +- 6 2 4 1 6 7 8 3 + - Cut it in half + - 6 2 4 1 + - Cut it in half + - 6 2 + - 4 1 + - 5 7 8 3 + - Cut it in half + - 5 7 + - 8 3 + +- First half + - sort 6 2 -> 2 6 + - sort 4 1 -> 1 4 + - merge them + - 1 2 4 6 + +- Second half + - sort 5 7 -> 5 7 + - sort 8 3 -> 3 8 + - merge them + - 3 5 7 8 + +- Now all that's left is to merge the two halves + - 1 comes before 3 + - 2 comes before 3 + - 3 comes before 4 + - 4 comes before 5 + - 5 comes before 6 + - 6 comes before 7 + - add 7 and 8 to the end + - **1 2 3 4 5 6 7 8** + +## Analysis + +- Big O analysis + - Best case? = O(nlog(n)) + - Average case? = O(nlog(n)) + - Worst case? = O(nlog(n)) + +- Stability? + - Yes! As long as we break a tie by always choosing from the first list + +- Space overhead? O(n)! + - Cannot merge in place! Need to use separate array + +# Summary + +| |Best Case |Average Case |Worst Case |Stability |Space Overhead| +|------------------|--------- |------------ |---------- |----------|--------------| +|**Selection Sort**|O(n^2) |O(n^2) |O(n^2) |Not stable|O(1) | +|**Heap Sort** |O(nlog(n))|O(nlog(n)) |O(nlog(n)) |Not stable|O(1) | +|**Insertion Sort**|O(n) |O(n^2) |O(n^2) |Stable |O(1) | +|**Shell Sort** |O(nlog(n))|>= O(n^(4/3))|>= O(n^(3/2))|Not stable|O(1) | +|**Bubble Sort** |O(n) |O(n^2) |O(n^2) |Stable |O(1) | +|**Quick Sort** |O(nlog(n))|O(nlog(n)) |O(n^2) |Not stable|O(log(n)) | +|**Merge Sort** |O(nlog(n))|O(nlog(n)) |O(nlog(n)) |Stable |O(n) | +|**Tree Sort** |O(nlog(n))|O(nlog(n)) |O(nlog(n)) |Stable |O(n) | -- cgit