130 likes | 281 Views
Classification & ID3. Dr. Riggs Spring 2004. Classification Problem. Given a some number of observed features Predict an unobserved feature (the ‘class’) Example: Given features of a borrower Predict whether he will default An interesting problem is learning rules from examples.
E N D
Classification & ID3 Dr. Riggs Spring 2004
Classification Problem • Given a some number of observed features • Predict an unobserved feature (the ‘class’) • Example: • Given features of a borrower • Predict whether he will default • An interesting problem is learning rules from examples Dr. Riggs
Example Data ?id ?size ?color ?shape ?class • (item 1 medium blue brick yes) • (item 2 small red sphere yes) • (item 3 large green pillar yes) • (item 4 large green sphere yes) • (item 5 small red wedge no) • (item 6 large red wedge no) • (item 7 large red pillar no) Dr. Riggs
Distinguish ALL By Size {1,2,3,4,5,6,7} = All medium small large {3 4 6 7 } { 1 } {2 5 } Class: Y N Y Y Y N N Rule: (feature ?id size medium) => (class ?id yes) Dr. Riggs
Distinguish {2 5} By Shape {2 5} = size: small sphere wedge {5 } {2 } Class:: Y N (feature ?id size small) (feature ?id shape sphere) => (class ?id yes) (feature ?id size small) (feature ?id shape wedge) => (class ?id no) Dr. Riggs
Distinguish {3 4 6 7 } By COLOR { 3 4 6 7 } = size: large green red { 6 7 } {3 4 } C: Y Y N N (feature ?id size large) (feature ?id color green) => (class ?id yes) (feature ?id size large) (feature ?id color red) => (class ?id no) Dr. Riggs
Considerations • Are the examples enough? • The examples must be enough to tell the classes apart • This is an unsolvable question • Are the rules the most efficient? • We could have made other choices • What should we uses to compare choices? Dr. Riggs
Entropy • Measures ‘disorder’ • Def: n H(m1..mn) = - Pr(mi) * lg( Pr(mi ) ) i=1 • Example (entropy of learning set): • Messages (m1…m7) : Y Y Y Y N N N • Pr(Y) = 4/7 Pr(N) = 3/7 • H = - [ 4/7*lg 4/7 + 3/7*lg(3/7) • - [ .571*-.243 + .429*-.368] = .985 lg = log2 Dr. Riggs
Gain If a set is partitioned by a feature into subsets • the gain in entropy is: Original_entropy - the_weighted_sum_of_subclass_entropies • Eg: Partition ALL = {1,2,3,4,5,6,7} by COLOR {blue 1 } {red 2 5 6 7} {green 3 4} partition =>{blue Y} {red Y N N N} {green Y Y} map H(blue)=0 H(red)= .811 H(green)=0 • GAIN(color) = H(all) - |ss|/|all|*H(ss) ss=red,green,blue = .985 – ( 1/7*0 + 4/7*.811 + 2/7*0) = .522 Dr. Riggs
Distinguish All By Color {1,2,3,4,5,6,7} = All red blue green { 3 4 } { 2 5 6 7 } {1} map: Y Y N N N Y Y H: 0 -1/4lg1/4 -3/4lg3/4 0 wH= 0 + 4/7*(.5+.31) + 0 = .81 Gain = .985 - .464 = .521 Dr. Riggs
Distinguish All By Shape {1,2,3,4,5,6,7} = All sphere brick wedge pillar {1} { 2 4 } { 5 6 } {3 7} C: Y Y Y N N Y N H: 0 0 0 1 wH= 0 + 2/7*0 + 2/7 *0 + 2/7*1 =.286 Gain = .985 -.286 = .699 Dr. Riggs
ID3 • Given: a learning set (LS) • examples w/ features & outcome (class) • Use each (feature,value) to partition the LS • Calculate H for each partition Pf,v • Calculate the gain for each feature • Original H – | Pf,v | / |LS| * H(Pf,v) v • Partition by the feature with highest gain • Apply ID3 to any subsets Pf,v with H>0 Dr. Riggs