310 likes | 402 Views
Knowledge-Based Discovery: Using Semantics in Machine Learning. Bruce Buchanan Joe Phillips University of Pittsburgh. buchanan @ cs.pitt.edu josephp @ cs.pitt.edu. Intelligent Systems Laboratory. Faculty: Bruce Buchanan, P.I., John Aronis
E N D
Knowledge-Based Discovery:Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh buchanan @ cs.pitt.edu josephp @ cs.pitt.edu
Intelligent Systems Laboratory • Faculty: Bruce Buchanan, P.I., John Aronis • Collaborators: John Rosenberg (Biol.Sci.), Greg Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson (Rehab.Sci.), Russ Altman (Stanford MIS) • Research Associates: Joe Phillips, Paul Hodor, Vanathi Gopalakrishnan, Wendy Chapman • Ph.D. Students: Gary Livingston, Dan Hennessy, Venkat Kolluri, Will Bridewell, Lili Ma • M.S. Students: Karl Gossett
GOALS • (A) Learn understandable & • interesting rules from data • (B) Construct an understandable & • coherent model from rules • METHOD: • To use background knowledge to search for: • simple rules with familiar predicates • interesting and novel rules • coherent models
Familiar Syntax (conditional rules) Syntactically Simple Semantically Simple Familiar Predicates Accurate Predictions Meaningful Rules Relevant to Question Novel Cost-Effective Coherent Model Rules or Models:Understandable | Interesting
The RL Program Explicit Bias Partial Domain Model New Cases Training Examples Performance Program MODEL RL Predictions HAMB RULES
(A) Individual Rules • J. Phillips • Rehabilitation Medicine Data
Simple single rules • Syntactic Simplicity • Fewer terms on the LHS • Explicitly stated constraints (rules with no more than N terms) • Tagged attributes (e.g. must have at least one control attribute to be interesting)
Simple sets of rules • Syntactic simplicity • Fewer rules: • independent rules • E.g. in physics: • U(x) = Ugravity(x) + Uelectronic(x) + Umagnetic(x) • HAMB removes highly similar terms from feature set • less independence when there’s feedback • e.g. medicine
Interestingness: • Given, controlled and observed • explicitly state observed attributes as interesting target • Temporal • future (or distant past) predictions are interesting • Influence diagram (e.g. Bayes net) • strong but more indirect influences are interesting
Using typed attribute background knowledge • Organize terms into “given”, “controlled” and “observed” • E.g. in medical domain “demographics”, “intervention” and “outcome” • Benefits: • Categorization of rules by whether they use givens (default), controls (controllable) or both (conditionally controllable):
Typed attribute example • Rehab. (RL; Phillips, Buchanan, Penrod) • > 2000 records observed given controlled temporal medical demographic medical admitgeneral_condition timerate ageracesex specific_condition absolutenormalize
Example interestingness: • Group rules by whether they predict by medical, demographic or both: • by medical: • Left_Body_Stroke => poor improvement (interesting, expected) • by demographic: • High_age => poor improvement (interesting, expected) • (Race=X) => poor improvement (interesting, NOT expected)
Using temporal background knowledge • Organize data by time • Utility may or may not extend to other metric spaces (e.g. space, mass) • Benefits: • Predictions parameterized by time: f(t) • Future or distant past may be interesting • Cyclical patterns
Temporal example • Geophysics (Scienceomatic; Phillips 2000) • Subduction zone discoveries of type: d(qafter) = d(qmain) + m*[t(qafter)-t(qmain)] + b • NOTE: This is not an accurate prediction! • interesting, generally quakes can’t be predicted X d
Using influence diagram background knowledge • This is future work! • Organize terms to follow pre-existing influence diagram • E.g. Bayesian nets, but do not need conditional probabilities • Benefits: • Suggest hidden variables, new influences • f(x) => f’(x,y)
Interestingness summary • How different types of background knowledge help us achieve interestingness: • Explicitly stated: “observed” attributes • Implicitly stated: parameterized equations with “interesting” parameters • Learned: “new” influence factors
(B) Coherent Models • B.Buchanan • Protein Data
EXAMPLE:Predicting Ca++ Binding Sites (G.Livingston) Given 3-d descriptions of 16 sites in proteins that bind calcium ions & 100 other sites that do not Find a model that allows predicting whether a proposed new site will bind Ca++ [in terms of subset of 63 attributes]
Ca++ binding sites in proteins SOME ATTRIBUTES ATOM-NAME-IS-C ATOM-NAME-IS-O CHARGE CHARGE-WITH-HIS HYDROPHOBICITY MOBILITY RESIDUE-CLASS1-IS-CHARGED RESIDUE-CLASS1-IS-HYDROPHOBIC RESIDUE-CLASS2-IS-ACIDIC RESIDUE-CLASS2-IS-NONPOLAR RESIDUE-CLASS2-IS-UNKNOWN RESIDUE-NAME-IS-ASP RESIDUE-NAME-IS-GLU RESIDUE-NAME-IS-HOH RESIDUE-NAME-IS-LEU RESIDUE-NAME-IS-VAL RING-SYSTEM SECONDARY-STRUCTURE1-IS-4-HELIX SECONDARY-STRUCTURE1-IS-BEND SECONDARY-STRUCTURE1-IS-HET SECONDARY-STRUCTURE1-IS-TURN SECONDARY-STRUCTURE2-IS-BETA SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME
Predicting Ca++ Binding Sites semantic types of attributes: e.g., Physical Chemical Structural solvent accessibility charge VDW volume heteroatom oxygen carbonyl ASN helix beta-turn ring-system mobility
Coherent Model = subset of locally acceptable rules that • explains as much of the data • uses entrenched predicates [Goodman] • uses predicates of same semantic type • uses predicates of same grain size • uses classes AND their complements • avoids rules that are "too similar": identical; subsuming; sem.close
EXAMPLE:predict Ca++ binding sites in proteins 158 rules found independently. E.g., R1: IF a site (a) is charged > 18.5 AND (b) no. of C=O > 18.75 THEN it binds calcium R2: IF a site (a) is charged > 18.5 AND (b) no. of ASN > 15 THEN it binds calcium
Predicting Ca++ Binding Sites semantic network of attributes Heteroatoms Sulfur Oxygen ... Nitrogen "Hydroxyl" Carbonyl Amide Amine | SH OH ASP GLU ASN GLN...PRO | / CYS SER THR TYR ... ...
Ca++ binding sites in proteins 58 rules above threshold: threshold = at least 80% TP AND no more than 20% FP 42 rules predict SITE 16 rules predict NON-SITE Average accuracy for five 5-fold x-validations = 100% for the redundant model with 58 rules
Predicting Ca++ Binding Sites Prefer complementary rules -- e.g., R59: IF, within 5 A of a site , # oxygens > 6.5 THEN it binds calcium R101: IF, within 5 A of a site , # oxygens <= 6.5 THEN it does NOT bind calcium o o
5 A Radius Model o Five perfect rules* R1. #Oxygen LE 6.5 --> NON-SITE R2. Hydrophobicity GT -8.429 --> NON-SITE R3. #Oxygen GT 6.5 --> SITE R4. Hydrophobicity LE -8.429 --> SITE R5. #Carbonyl GT 4.5 & #Peptide LE 10.5 --> SITE *( 100% of TP's and 0 FP's )
Final Result Ca++ binding sites in proteins Model with 5 rules: same accuracy no unique predicates no subsumed or very similar rules more genl. rules for SITES (prior prob. < 0.01) more specific rules for NON-SITES (prior prob. > 0.99)
Predicting Ca++ Binding Sites Attribute Hierarchies RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED (ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU MET PHE PRO VAL)
Attribute Hierarchies RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED ACIDIC (ARG ASP GLU) BASIC ( LYS) NONPOLAR (ALA ILE LEU MET PHE PRO VAL) HIS TRP
CONCLUSION Induction systems can be augmented with semantic criteria to provide (A) interesting & understandable rules • syntactically simple • meaningful (B) coherent models • equally predictive • closer to a theory
CONCLUSION • We have shown • how specific types of background knowledge might be incorporated in the rule discovery process • possible benefits of incorporating those types of knowledge • more coherent models • more understandable models • more accurate models