summaryrefslogtreecommitdiff
path: root/04-20.md
diff options
context:
space:
mode:
authorlshprung <lshprung@yahoo.com>2020-04-20 10:02:52 -0700
committerlshprung <lshprung@yahoo.com>2020-04-20 10:02:52 -0700
commit183f93c9c02e10001275229536d789c5c304a1fd (patch)
treed5811448c0117bebfa6679d6a04db090f0c2185a /04-20.md
parentd822758f67f87f62cf4f50ac09439d6c051c7e95 (diff)
Post-class 04/20
Diffstat (limited to '04-20.md')
-rw-r--r--04-20.md135
1 files changed, 135 insertions, 0 deletions
diff --git a/04-20.md b/04-20.md
new file mode 100644
index 0000000..7514236
--- /dev/null
+++ b/04-20.md
@@ -0,0 +1,135 @@
+[\<- 04/15](04-15.md)
+
+---
+
+## Review What We Have Learned So Far
+
+|Basic Operation|Unsorted Array|Sorted Array|
+|---------------|--------------|------------|
+|**Search Find**|O(n) |O(log(n)) |
+|**Insert** |O(1) |O(n) |
+|**Delete** |O(1) |O(n) |
+|**Min/Max** |O(n) |O(1) |
+
+- Search Find
+ - Unsorted would use sequential search
+ - Sorted would use binary search
+
+- Insert
+ - Unsorted: you can just insert at the end
+ - Sorted: need to determine where to insert so that the array stays ordered
+
+- Delete
+ - Unsorted: can just swap the last entry in for the deleted entry and make the array length minus 1
+ - Sorted: need to shift all indices after deleted index down
+
+- Min/Max
+ - Unsorted: Need to traverse entire array
+ - Sorted: Min and max are the first and last indices
+
+## Making it Faster
+
+- Let's work on making searching in an array even faster: O(1)
+
+## Hashing
+
+- A key-to-address mapping process
+
+- Ex.
+ - We have keys
+ - x
+ - y
+ - We have address
+ - 000
+ - 001
+ - 002
+ - ...
+ - 100
+ - We have a function `h()`
+ - `h(x)` is the **address** for **key** x, `h(y)` is the **address** for **key** y
+ - to search: `a[h(x)]`... O(1)
+
+## Outline
+
+- Q: I want to know if someone recieved a given score x on the exam. Each score should range from 0 to 100. How could I do this quickly?
+- A: Have an array in which each score has its own slot
+- What is the hashing method `h(x)`?
+
+```
+h(x) = x;
+```
+
+- It's called **direct hashing**!
+
+## Example 2
+
+- Let's try something else. We're reading words from a book, so let's ask if all the letters were used
+- Q: What if our inputs were letters? How to construct `h()`?
+ - We have an array from index 0 to 51. The first half is for lower case letters. The second half for upper case letters
+ - 'a' should go to index 0
+ - 'b' should go to index 1... etc
+ - We will take advantage of the ASCII table
+
+```
+if('a' <= x <= 'z') h(x) = x-'a';
+if('A' <= x <= 'Z') h(x) = x-'A'+26; //offset to the second half of the array
+```
+
+- The Big-O runtime of this is O(1)
+- This is so-called **subtraction method**
+
+## Example 3
+
+- Ex. What if I want to know whether someone in the university earns a certain salary? And I know that the lowest paid person is #30k and highest paid is $200k, and salaries are in increments of $1k... so...
+- Q: How would you map them?
+ - There should be an array where index 0 is 30k, 1 is 31k, 2 is 32k, etc. The final index is 200k.
+
+```
+h(x) = (x - 30k)/1k
+```
+
+## Concepts
+
+- What we're doing in these examples is trying to find a function, `h(x)`, that takes as inputs some arbitrary data and maps them onto 0 to m-1, which we can then use ar array indices
+- This technique is called **hashing**
+- The array indexed by such a function is called a **hash table**
+- The function itself is called the **hash function**
+- It turns out that if we know the keys in advance then we can always derive a **perfect hash function** (each key hashes to its own unique location). **One-to-one mapping**
+
+## Example 4
+
+- What if I wanted to use you ID numbers, but map them to no more than 100 slots?
+ - Using the first digit... is terrible (will not work)
+ - Using the last digit... a little better, but would only work with at most 10 students
+ - Using the last two digits... this might work, but there is **hashing collision** if two students have the same last two digits
+ - Multiply by 17, divide by 97 and take the remainder... maybe better, but no guaruntee that there won't be collision
+
+In practice, we don't know the keys in advance, it is very hard to derive a perfect hash function
+
+## A New Hashing Method - Modulo Arithmetic
+
+- The simplest way to make sure a hash function always maps within 0..m-1 is to take the result and take the remainder fo dividing by m
+- Since our hash function needs to yield values between "0" and "m-1" (m is max), we usually do: `h(x) % m`
+- We like to pick m as a **prime number**, since it will generate less collisions (you don't have to understand why)
+
+## Modulo Arithmetic
+
+- Let's have a simple example
+ - m = 7
+ - k = 8, 14, 12, 2, 4
+ - `h(k) = k % m`
+
+- h(8) -> go to index 1
+- h(14) -> go to index 2
+- h(12) -> go to index 5
+- h(2) -> go to index 2
+- h(4) -> go to index 4
+
+- If we insert 11...
+ - h(11) -> go to index 4
+ - There is multiple keys mapped to the same index now -> collision :(
+
+## Collision Resolutions
+
+- Even if we choose a good hash function that does a good job distributing the keys across the entire hash table, we will still have collisions in practice and need to deal with them
+- The next step will be to cover **Collision Resolutions**...