Post-class 05/18

author: lshprung <lshprung@yahoo.com> 2020-05-18 11:28:37 -0700
committer: lshprung <lshprung@yahoo.com> 2020-05-18 11:28:37 -0700
commit: cf705f02b242ae198dcb6a76c404990937feb52e (patch)
tree: df5a372ff3a94b4d93ea580d7be22012340d6bd4
parent: ccf28a5ef026032cc878b9fd199b22c9b1787b6e (diff)
4 files changed, 143 insertions, 0 deletions
diff --git a/05-15.md b/05-15.md
index 9a34c34..61dace5 100644
--- a/05-15.md
+++ b/05-15.md
@@ -97,3 +97,7 @@ binary heaps work because of **no holes**, and are a great example of an implici
 - Generally speaking, a heap can be used whenever you need **quick access** to the largest (or smallest) item, because that item will always be the first element in the array or at the root of the tree
 - It's not good for searching, since the remainder of the array is kept partially unsorted
 - Some examples of heap: priority queue, Huffman coding, heap sort, etc.
+
+---
+
+[05/18 ->](05-18.md)
diff --git a/05-18.md b/05-18.md
new file mode 100644
index 0000000..d183631
--- /dev/null
+++ b/05-18.md
@@ -0,0 +1,139 @@
+[\<- 05/15](05-15.md)
+
+---
+
+# An Application of Binary Trees: Huffman Coding
+
+## Encoding
+
+- Examples
+	- Text
+	- Video
+	- Image
+	- Audio
+
+- Computer only understands low layer bits (0s and 1s)
+
+- Encoding is turning those examples into the low layer bits
+
+### An Application - Encoding
+
+- An example - using only 1 and 0s to represent following cases
+	- true or false - only need 1 bit
+		- 1 is true
+		- 0 is false
+	- 4 directions: south, north, east, and west - need 2 bits
+		- 00 is south
+		- 01 is north
+		- 10 is east
+		- 11 is west
+
+- ASCII is a fixed width encoding. Every character requires 8 bits to encode
+
+- Motiviation: Can we save some bits in encoding?
+- Thoughts:
+	- some characters (like e and n) occur frequently. whereas characters (like q and z) occur very infrequently
+	- Do you still remember probability search (move to front heuristic)?
+
+### One Example
+
+- Ex. "the fat cat sat on the mat"
+	- If we use fixed-length binary code to represent the above sentence, what will that be? 26\*8
+	- Can we save some bits by using fewer bits for more frequent letters?
+		- Frequency of letters:
+			- a:4, c:1, e:2, f:1, h:2, m:1, n:1, o:1, s:1, t:6, " ":6
+		- Bits of letters
+			- t & " ", frequently occur - 1 bit
+			- a, h, e, less frequently - 2bits
+			- etc.
+
+## Huffman Coding
+
+- Huffman Coding: **variable width encoding**
+	- We are likely to **save on the total number of bits** required to encode a file/document if we represented frequently occurring character with fewer bits and infrequently occurring characters with more than 8 bits
+
+- Question: How do we know, given a variable length encoding, where characters stop and start?
+	- ex: if e=11, z=1111... so what's 111111?
+
+- Answer: We use **unique prefixes**. Each character will have its own unique prefix -> how to implement?
+	- We build a binary tree. By having only leaf nodes to represent characters, we are able to guarantee unique prefixes
+
+## Huffman Tree
+
+- Use binary trees (i.e. Huffman tree). e.g. 'ABCA'
+
+![example tree](05-18_img1.png)
+
+- B: 00
+- C: 01
+- A: 1
+
+- Question: How do we determine the optimal encoding given the frequencies of the characters?
+- Answer: Use the leaf nodes that are farther away from the root to represent characters appear less frequently
+
+- so now you have the "Huffman tree"
+- label the left and right sub-branches of every node as 0 and 1 respectively
+
+## Huffman Coding
+
+- Ex. "the fat cat sat on the mat"
+	- a:4, c:1, e:2, f:1, h:2, m:1, n:1, o:1, s:1, t:6, " ":6
+	- create a node for each letter and sort according to their values
+	- remove two minimum nodes, combine with a new node, and insert this node with a new weight - the combined value of its two child nodes
+	- Is it better than the fixed length encoding?
+
+![example tree](05-18_img2.png)
+
+- This is our Huffman Tree
+- Our letters translate to:
+	- a   -> 011
+	- c   -> 00000
+	- e   -> 0100
+	- f   -> 00001
+	- h   -> 0101
+	- m   -> 00010
+	- n   -> 00011
+	- o   -> 0010
+	- s   -> 0011
+	- t   -> 10
+	- " " -> 11
+
+- This encoding only requires 80 bits (which significantly saves the number of bits needed for encoding)
+
+---
+
+### Code Structure
+
+- A struct node includes:
+	- `struct node *parent`
+	- `int count`
+
+1. Open file
+	- count frequency of each character
+	- need a while loop, need to use `getc()`
+	- put characters in an array, assigning them an index reflecting their ascii code
+	- the last slot on the array will be the EOF
+	- the length of the array will be 257 (2^8 + 1)
+
+2. `NODE *mknode(int data, NODE *left, NODE *right);`
+	- We can use this function to generate the leaf nodes as well
+	- also need node comparison
+
+```
+NODE *mknode(int data, NODE *left, NODE *right){
+	NODE *np = malloc(...);
+	assert(...);
+	np->parent = NULL;
+	np->count = data;
+	//connect
+}
+```
+
+3. `pq = createqueue()`
+	- call addEntry: `addEntry(pq, nodes[i]);`
+	- `l = removeEntry(pq);`
+	- `r = removeEntry(pq);`
+	- `new = mknode(l->count + r->conut, l, r)`
+	- `addEntry(pq, np);`
+
+4. print 
diff --git a/05-18_img1.png b/05-18_img1.png
new file mode 100644
index 0000000..69deb3f
--- /dev/null
+++ b/05-18_img1.png
diff --git a/05-18_img2.png b/05-18_img2.png
new file mode 100644
index 0000000..bfc05fb
--- /dev/null
+++ b/05-18_img2.png
author	lshprung <lshprung@yahoo.com>	2020-05-18 11:28:37 -0700
committer	lshprung <lshprung@yahoo.com>	2020-05-18 11:28:37 -0700
commit	cf705f02b242ae198dcb6a76c404990937feb52e (patch)
tree	df5a372ff3a94b4d93ea580d7be22012340d6bd4
parent	ccf28a5ef026032cc878b9fd199b22c9b1787b6e (diff)