160 likes | 364 Views
CSCE 3110 Data Structures & Algorithm Analysis. Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Trees Applications. Trees: A Review (again? ). General trees one parent, N children Binary tree ISA General tree + max 2 children Binary search tree ISA Binary tree
E N D
CSCE 3110Data Structures & Algorithm Analysis Rada Mihalcea http://www.cs.unt.edu/~rada/CSCE3110 Trees Applications
Trees: A Review (again? ) • General trees • one parent, N children • Binary tree • ISA General tree • + max 2 children • Binary search tree • ISA Binary tree • + left subtree < parent < right subtree • AVL tree • ISA Binary search tree • + | height left subtree – height right subtree | 1
Trees: A Review (cont’d) • Multi-way search tree • ISA General tree • + Each node has K keys and K+1 children • + All keys in child K < key K < all keys in child K+1 • 2-4 Tree • ISA Multi-way search tree • + All nodes have at most 3 keys / 4 children • + All leaves are at the same level • B-Tree • ISA Multi-way search tree • + All nodes have at least T keys, at most 2T(+1) keys • + All leaves are at the same level
Tree Applications • Data Compression • Huffman tree • Automatic Learning • Decision trees
Huffman code • Very often used for text compression • Do you know how gzip or winzip works? • Compression methods • ASCII code uses codes of equal length for all letters how many codes? • Today’s alternative to ASCII? • Idea behind Huffman code: use shorter length codes for letters that are more frequent
Huffman Code • Build a list of letters and frequencies “have a great day today” • Build a Huffman Tree bottom up, by grouping letters with smaller occurrence frequencies
Huffman Codes • Write the Huffman codes for the strings • “abracadabra” • “Veni Vidi Vici”
Huffman Code • Running time? • Suppose N letters in input string, with L unique letters • What is the most important factor for obtaining highest compression? • Compare: [assume a text with a total of 1000 characters] • I. Three different characters, each occurring the same number of times • II. 20 different characters, 19 of them occurring only once, and the 20st occurring the rest of the time
One More Application • Heuristic Search • Decision Trees • Given a set of examples, with an associated decision (e.g. good/bad, +/-, pass/fail, caseI/caseII/caseIII, etc.) • Attempt to take (automatically) a decision when a new example is presented • Predict the behavior in new cases!
Data Records Name A B C D E F G 1. Jeffrey B. 1 0 1 0 1 0 1 - 2. Paul S. 0 1 1 0 0 0 1 - 3. Daniel C. 0 0 1 0 0 0 0 - 4. Gregory P. 1 0 1 0 1 0 0 - 5. Michael N. 0 0 1 1 0 0 0 - 6. Corinne N. 1 1 1 0 1 0 1 + 7. Mariyam M. 0 1 0 1 0 0 1 + 8. Stephany D. 1 1 1 1 1 1 1 + 9. Mary D. 1 1 1 1 1 1 1 + 10. Jamie F. 1 1 1 0 0 1 1 +
Fields in the Record A: First name ends in a vowel? B: Neat handwriting? C: Middle name listed? D: Senior? E: Got extra-extra credit? F: Google brings up home page? G: Google brings up reference?
Build a Classification Tree Internal nodes: features Leaves: classification F 0 1 A D A 2,3,7 1,4,5,6 10 Error: 30% 8,9
Different Search Problem Given a set of data records with their classifications, pick a decision tree: search problem! Challenges: • Scoring function? • Large space of trees. What’s a good tree? • Low error on given set of records • Small
“Perfect” Decision Tree C middle name? 0 1 E EEC? 0 1 F B Google? Neat? 0 0 1 1 Training set Error: 0% (can always do this?)
Search For a Classification • Classify new records New1. Mike M. 1 0 1 1 0 0 1 ? New2. Jerry K. 0 1 0 1 0 0 0 ?