110 likes | 225 Views
Induction: Discussion. Sources: Chapter 3, Lenz et al Book: Case-based Reasoning Technology www.aic.nrl.navy.mil/~aha/research/applications.html. Patrons?. full. none. some. X4(+),x12(+), x2(-),x5(-),x9(-),x10(-). X7(-),x11(-). X1(+),x3(+),x6(+),x8(+).
E N D
Induction: Discussion Sources: Chapter 3, Lenz et al Book: Case-based Reasoning Technology www.aic.nrl.navy.mil/~aha/research/applications.html
Patrons? full none some X4(+),x12(+), x2(-),x5(-),x9(-),x10(-) X7(-),x11(-) X1(+),x3(+),x6(+),x8(+) The standard Expected Value Formula Information Gain Formula Gain(A) = I(p/(p+n),n/(p+n)) – Remainder(A) Reminder(A) = p(A,1) I(p1/(p1+ n1), n1/(p1+ n1)) + p(A,2) I(p2/(p2+ n2), n2/(p2+ n2)) + p(A,3) I(p3/(p3+ n3), n3/(p3+ n3))
Patrons? full none some X4(+),x12(+), x2(-),x5(-),x9(-),x10(-) X7(-),x11(-) X1(+),x3(+),x6(+),x8(+) The IDT Example Gain(Patrons) = 1 – ((2/12)I(0,1)+(4/12)I(1,0)+(6/12)I(2/6,4/6)) = 0.541
Type? burger italian french thai X3(+),x12(+), x7(-),x9(-) X6(+), x10(-) X1(+), x5(-) X4(+),x12(+) x2(-),x11(-) The IDT Example (II) Gain(Type) = 1 – ((2/12)I(1/2,1/2)+(2/12)I(1/2,1/2)+ (4/12)I(2/4,2/4)+(4/12)I(2/4,2/4)) = 0 Thus Parents is a better choice than Type
Induction: Fielded Applications • Westinghouse: Transforming uranium gas • Hartford Steam Boiler: Transformer diagnosis • Steel Works Jesenice: Oil/lubricant properties • American Express UK: credit cards applicant • Siemens (BMT): Equipment configuration • USAF school: Thallium diagnosis • Boeing (Gold-digger): Manufacturing flaws • R.R. Donelly and Sons (APOS): Banding • Enichem (Enigma): Trouble shooting motor pumps • Palomar Observation (SKICAT): Astronomical cataloging • Continuum (Shopping): WWW shopping • …
no Borderline? yes (10% of 104) Induced Rule System Accept? Classifying Credit Card Applications(from (Aha, 1996)) Credit card application • American Express UK • Problem: Expert accuracy was below average (48%) • Evaluation: system was iteratively refined with experts • 18 attributes (age, years of residence, etc) • Improved accuracy: 75%+
Reduce Process Delays of Rotogravure Printers • Problem: Bandwidth often appears on chrome cylinders causing a shutdown or costly replacement of cylinders. • Cause unknown • Use of inductive process to predict setting of control parameters (e.g., ink viscosity) • Rules were posted on shop floor • Gain: less downtime and lower replacement costs
Data collection Induction of Decision Trees/rules Evaluation of DT/rules Fielding and acceptance Maintenance Developing Cycle of IDT Applications(Adapted from (Langley, 1995)) Problem formulation
When to Consider Decision Trees • Examples describable by attribute-value pairs • Target function is discrete valued • Disjunctive hypothesis might be required • Possible noise in data Some functions are not amenable to be represented with decision trees: Parity function (returns true if input has an even number of 1’s)
Induction: Advantages • Building a decision tree is a straightforward process • The information gain measure is built on a sound basis • During consultation, only a few tests are necessary before a classification is obtained • For industrial applications, the consultation system can be delivered in a runtime system
Induction: Limitations • DTs are not incremental: cannot be modified in runtime • Consultation system is static • Handling of unknown values for attributes is problematic • The inductive approach cannot distinguish between various classes of users (e.g., experts vs non experts)