Post-class 04/20

author: lshprung <lshprung@yahoo.com> 2020-04-20 10:02:52 -0700
committer: lshprung <lshprung@yahoo.com> 2020-04-20 10:02:52 -0700
commit: 183f93c9c02e10001275229536d789c5c304a1fd (patch)
tree: d5811448c0117bebfa6679d6a04db090f0c2185a /04-20.md
parent: d822758f67f87f62cf4f50ac09439d6c051c7e95 (diff)
1 files changed, 135 insertions, 0 deletions
diff --git a/04-20.md b/04-20.md
new file mode 100644
index 0000000..7514236
--- /dev/null
+++ b/04-20.md
@@ -0,0 +1,135 @@
+[\<- 04/15](04-15.md)
+
+---
+
+## Review What We Have Learned So Far
+
+|Basic Operation|Unsorted Array|Sorted Array|
+|---------------|--------------|------------|
+|**Search Find**|O(n)          |O(log(n))   |
+|**Insert**     |O(1)          |O(n)        |
+|**Delete**     |O(1)          |O(n)        |
+|**Min/Max**    |O(n)          |O(1)        |
+
+- Search Find
+	- Unsorted would use sequential search
+	- Sorted would use binary search
+
+- Insert
+	- Unsorted: you can just insert at the end
+	- Sorted: need to determine where to insert so that the array stays ordered
+
+- Delete
+	- Unsorted: can just swap the last entry in for the deleted entry and make the array length minus 1
+	- Sorted: need to shift all indices after deleted index down
+
+- Min/Max
+	- Unsorted: Need to traverse entire array
+	- Sorted: Min and max are the first and last indices
+
+## Making it Faster
+
+- Let's work on making searching in an array even faster: O(1)
+
+## Hashing
+
+- A key-to-address mapping process
+
+- Ex.
+	- We have keys
+		- x
+		- y
+	- We have address
+		- 000
+		- 001
+		- 002
+		- ...
+		- 100
+	- We have a function `h()`
+	- `h(x)` is the **address** for **key** x, `h(y)` is the **address** for **key** y
+	- to search: `a[h(x)]`... O(1)
+
+## Outline
+
+- Q: I want to know if someone recieved a given score x on the exam. Each score should range from 0 to 100. How could I do this quickly?
+- A: Have an array in which each score has its own slot
+- What is the hashing method `h(x)`?
+
+```
+h(x) = x;
+```
+
+- It's called **direct hashing**!
+
+## Example 2
+
+- Let's try something else. We're reading words from a book, so let's ask if all the letters were used
+- Q: What if our inputs were letters? How to construct `h()`?
+	- We have an array from index 0 to 51. The first half is for lower case letters. The second half for upper case letters
+		- 'a' should go to index 0
+		- 'b' should go to index 1... etc
+		- We will take advantage of the ASCII table
+
+```
+if('a' <= x <= 'z') h(x) = x-'a';
+if('A' <= x <= 'Z') h(x) = x-'A'+26; //offset to the second half of the array
+```
+
+- The Big-O runtime of this is O(1)
+- This is so-called **subtraction method**
+
+## Example 3
+
+- Ex. What if I want to know whether someone in the university earns a certain salary? And I know that the lowest paid person is #30k and highest paid is $200k, and salaries are in increments of $1k... so...
+- Q: How would you map them?
+	- There should be an array where index 0 is 30k, 1 is 31k, 2 is 32k, etc. The final index is 200k.
+
+```
+h(x) = (x - 30k)/1k
+```
+
+## Concepts
+
+- What we're doing in these examples is trying to find a function, `h(x)`, that takes as inputs some arbitrary data and maps them onto 0 to m-1, which we can then use ar array indices
+- This technique is called **hashing**
+- The array indexed by such a function is called a **hash table**
+- The function itself is called the **hash function**
+- It turns out that if we know the keys in advance then we can always derive a **perfect hash function** (each key hashes to its own unique location). **One-to-one mapping**
+
+## Example 4
+
+- What if I wanted to use you ID numbers, but map them to no more than 100 slots?
+	- Using the first digit... is terrible (will not work)
+	- Using the last digit... a little better, but would only work with at most 10 students
+	- Using the last two digits... this might work, but there is **hashing collision** if two students have the same last two digits
+	- Multiply by 17, divide by 97 and take the remainder... maybe better, but no guaruntee that there won't be collision
+
+In practice, we don't know the keys in advance, it is very hard to derive a perfect hash function
+
+## A New Hashing Method - Modulo Arithmetic
+
+- The simplest way to make sure a hash function always maps within 0..m-1 is to take the result and take the remainder fo dividing by m
+- Since our hash function needs to yield values between "0" and "m-1" (m is max), we usually do: `h(x) % m`
+- We like to pick m as a **prime number**, since it will generate less collisions (you don't have to understand why)
+
+## Modulo Arithmetic
+
+- Let's have a simple example
+	- m = 7
+	- k = 8, 14, 12, 2, 4
+	- `h(k) = k % m`
+
+- h(8) -> go to index 1
+- h(14) -> go to index 2
+- h(12) -> go to index 5
+- h(2) -> go to index 2
+- h(4) -> go to index 4
+
+- If we insert 11...
+	- h(11) -> go to index 4
+		- There is multiple keys mapped to the same index now -> collision :(
+
+## Collision Resolutions
+
+- Even if we choose a good hash function that does a good job distributing the keys across the entire hash table, we will still have collisions in practice and need to deal with them
+- The next step will be to cover **Collision Resolutions**...
author	lshprung <lshprung@yahoo.com>	2020-04-20 10:02:52 -0700
committer	lshprung <lshprung@yahoo.com>	2020-04-20 10:02:52 -0700
commit	183f93c9c02e10001275229536d789c5c304a1fd (patch)
tree	d5811448c0117bebfa6679d6a04db090f0c2185a /04-20.md
parent	d822758f67f87f62cf4f50ac09439d6c051c7e95 (diff)