summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorlshprung <lshprung@yahoo.com>2020-05-18 11:28:37 -0700
committerlshprung <lshprung@yahoo.com>2020-05-18 11:28:37 -0700
commitcf705f02b242ae198dcb6a76c404990937feb52e (patch)
treedf5a372ff3a94b4d93ea580d7be22012340d6bd4
parentccf28a5ef026032cc878b9fd199b22c9b1787b6e (diff)
Post-class 05/18
-rw-r--r--05-15.md4
-rw-r--r--05-18.md139
-rw-r--r--05-18_img1.pngbin0 -> 27224 bytes
-rw-r--r--05-18_img2.pngbin0 -> 139860 bytes
4 files changed, 143 insertions, 0 deletions
diff --git a/05-15.md b/05-15.md
index 9a34c34..61dace5 100644
--- a/05-15.md
+++ b/05-15.md
@@ -97,3 +97,7 @@ binary heaps work because of **no holes**, and are a great example of an implici
- Generally speaking, a heap can be used whenever you need **quick access** to the largest (or smallest) item, because that item will always be the first element in the array or at the root of the tree
- It's not good for searching, since the remainder of the array is kept partially unsorted
- Some examples of heap: priority queue, Huffman coding, heap sort, etc.
+
+---
+
+[05/18 ->](05-18.md)
diff --git a/05-18.md b/05-18.md
new file mode 100644
index 0000000..d183631
--- /dev/null
+++ b/05-18.md
@@ -0,0 +1,139 @@
+[\<- 05/15](05-15.md)
+
+---
+
+# An Application of Binary Trees: Huffman Coding
+
+## Encoding
+
+- Examples
+ - Text
+ - Video
+ - Image
+ - Audio
+
+- Computer only understands low layer bits (0s and 1s)
+
+- Encoding is turning those examples into the low layer bits
+
+### An Application - Encoding
+
+- An example - using only 1 and 0s to represent following cases
+ - true or false - only need 1 bit
+ - 1 is true
+ - 0 is false
+ - 4 directions: south, north, east, and west - need 2 bits
+ - 00 is south
+ - 01 is north
+ - 10 is east
+ - 11 is west
+
+- ASCII is a fixed width encoding. Every character requires 8 bits to encode
+
+- Motiviation: Can we save some bits in encoding?
+- Thoughts:
+ - some characters (like e and n) occur frequently. whereas characters (like q and z) occur very infrequently
+ - Do you still remember probability search (move to front heuristic)?
+
+### One Example
+
+- Ex. "the fat cat sat on the mat"
+ - If we use fixed-length binary code to represent the above sentence, what will that be? 26\*8
+ - Can we save some bits by using fewer bits for more frequent letters?
+ - Frequency of letters:
+ - a:4, c:1, e:2, f:1, h:2, m:1, n:1, o:1, s:1, t:6, " ":6
+ - Bits of letters
+ - t & " ", frequently occur - 1 bit
+ - a, h, e, less frequently - 2bits
+ - etc.
+
+## Huffman Coding
+
+- Huffman Coding: **variable width encoding**
+ - We are likely to **save on the total number of bits** required to encode a file/document if we represented frequently occurring character with fewer bits and infrequently occurring characters with more than 8 bits
+
+- Question: How do we know, given a variable length encoding, where characters stop and start?
+ - ex: if e=11, z=1111... so what's 111111?
+
+- Answer: We use **unique prefixes**. Each character will have its own unique prefix -> how to implement?
+ - We build a binary tree. By having only leaf nodes to represent characters, we are able to guarantee unique prefixes
+
+## Huffman Tree
+
+- Use binary trees (i.e. Huffman tree). e.g. 'ABCA'
+
+![example tree](05-18_img1.png)
+
+- B: 00
+- C: 01
+- A: 1
+
+- Question: How do we determine the optimal encoding given the frequencies of the characters?
+- Answer: Use the leaf nodes that are farther away from the root to represent characters appear less frequently
+
+- so now you have the "Huffman tree"
+- label the left and right sub-branches of every node as 0 and 1 respectively
+
+## Huffman Coding
+
+- Ex. "the fat cat sat on the mat"
+ - a:4, c:1, e:2, f:1, h:2, m:1, n:1, o:1, s:1, t:6, " ":6
+ - create a node for each letter and sort according to their values
+ - remove two minimum nodes, combine with a new node, and insert this node with a new weight - the combined value of its two child nodes
+ - Is it better than the fixed length encoding?
+
+![example tree](05-18_img2.png)
+
+- This is our Huffman Tree
+- Our letters translate to:
+ - a -> 011
+ - c -> 00000
+ - e -> 0100
+ - f -> 00001
+ - h -> 0101
+ - m -> 00010
+ - n -> 00011
+ - o -> 0010
+ - s -> 0011
+ - t -> 10
+ - " " -> 11
+
+- This encoding only requires 80 bits (which significantly saves the number of bits needed for encoding)
+
+---
+
+### Code Structure
+
+- A struct node includes:
+ - `struct node *parent`
+ - `int count`
+
+1. Open file
+ - count frequency of each character
+ - need a while loop, need to use `getc()`
+ - put characters in an array, assigning them an index reflecting their ascii code
+ - the last slot on the array will be the EOF
+ - the length of the array will be 257 (2^8 + 1)
+
+2. `NODE *mknode(int data, NODE *left, NODE *right);`
+ - We can use this function to generate the leaf nodes as well
+ - also need node comparison
+
+```
+NODE *mknode(int data, NODE *left, NODE *right){
+ NODE *np = malloc(...);
+ assert(...);
+ np->parent = NULL;
+ np->count = data;
+ //connect
+}
+```
+
+3. `pq = createqueue()`
+ - call addEntry: `addEntry(pq, nodes[i]);`
+ - `l = removeEntry(pq);`
+ - `r = removeEntry(pq);`
+ - `new = mknode(l->count + r->conut, l, r)`
+ - `addEntry(pq, np);`
+
+4. print
diff --git a/05-18_img1.png b/05-18_img1.png
new file mode 100644
index 0000000..69deb3f
--- /dev/null
+++ b/05-18_img1.png
Binary files differ
diff --git a/05-18_img2.png b/05-18_img2.png
new file mode 100644
index 0000000..bfc05fb
--- /dev/null
+++ b/05-18_img2.png
Binary files differ