Knowledge-Based Discovery: Using Semantics in Machine Learning

Knowledge-Based Discovery:Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh buchanan @ cs.pitt.edu josephp @ cs.pitt.edu

Intelligent Systems Laboratory • Faculty: Bruce Buchanan, P.I., John Aronis • Collaborators: John Rosenberg (Biol.Sci.), Greg Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson (Rehab.Sci.), Russ Altman (Stanford MIS) • Research Associates: Joe Phillips, Paul Hodor, Vanathi Gopalakrishnan, Wendy Chapman • Ph.D. Students: Gary Livingston, Dan Hennessy, Venkat Kolluri, Will Bridewell, Lili Ma • M.S. Students: Karl Gossett

GOALS • (A) Learn understandable & • interesting rules from data • (B) Construct an understandable & • coherent model from rules • METHOD: • To use background knowledge to search for: • simple rules with familiar predicates • interesting and novel rules • coherent models

Familiar Syntax (conditional rules) Syntactically Simple Semantically Simple Familiar Predicates Accurate Predictions Meaningful Rules Relevant to Question Novel Cost-Effective Coherent Model Rules or Models:Understandable | Interesting

The RL Program Explicit Bias Partial Domain Model New Cases Training Examples Performance Program MODEL RL Predictions HAMB RULES

(A) Individual Rules • J. Phillips • Rehabilitation Medicine Data

Simple single rules • Syntactic Simplicity • Fewer terms on the LHS • Explicitly stated constraints (rules with no more than N terms) • Tagged attributes (e.g. must have at least one control attribute to be interesting)

Simple sets of rules • Syntactic simplicity • Fewer rules: • independent rules • E.g. in physics: • U(x) = Ugravity(x) + Uelectronic(x) + Umagnetic(x) • HAMB removes highly similar terms from feature set • less independence when there’s feedback • e.g. medicine

Interestingness: • Given, controlled and observed • explicitly state observed attributes as interesting target • Temporal • future (or distant past) predictions are interesting • Influence diagram (e.g. Bayes net) • strong but more indirect influences are interesting

Using typed attribute background knowledge • Organize terms into “given”, “controlled” and “observed” • E.g. in medical domain “demographics”, “intervention” and “outcome” • Benefits: • Categorization of rules by whether they use givens (default), controls (controllable) or both (conditionally controllable):

Typed attribute example • Rehab. (RL; Phillips, Buchanan, Penrod) • > 2000 records observed given controlled temporal medical demographic medical admitgeneral_condition timerate ageracesex specific_condition absolutenormalize

Example interestingness: • Group rules by whether they predict by medical, demographic or both: • by medical: • Left_Body_Stroke => poor improvement (interesting, expected) • by demographic: • High_age => poor improvement (interesting, expected) • (Race=X) => poor improvement (interesting, NOT expected)

Using temporal background knowledge • Organize data by time • Utility may or may not extend to other metric spaces (e.g. space, mass) • Benefits: • Predictions parameterized by time: f(t) • Future or distant past may be interesting • Cyclical patterns

Temporal example • Geophysics (Scienceomatic; Phillips 2000) • Subduction zone discoveries of type: d(qafter) = d(qmain) + m*[t(qafter)-t(qmain)] + b • NOTE: This is not an accurate prediction! • interesting, generally quakes can’t be predicted X d

Using influence diagram background knowledge • This is future work! • Organize terms to follow pre-existing influence diagram • E.g. Bayesian nets, but do not need conditional probabilities • Benefits: • Suggest hidden variables, new influences • f(x) => f’(x,y)

Interestingness summary • How different types of background knowledge help us achieve interestingness: • Explicitly stated: “observed” attributes • Implicitly stated: parameterized equations with “interesting” parameters • Learned: “new” influence factors

(B) Coherent Models • B.Buchanan • Protein Data

EXAMPLE:Predicting Ca++ Binding Sites (G.Livingston) Given 3-d descriptions of 16 sites in proteins that bind calcium ions & 100 other sites that do not Find a model that allows predicting whether a proposed new site will bind Ca++ [in terms of subset of 63 attributes]

Ca++ binding sites in proteins SOME ATTRIBUTES ATOM-NAME-IS-C ATOM-NAME-IS-O CHARGE CHARGE-WITH-HIS HYDROPHOBICITY MOBILITY RESIDUE-CLASS1-IS-CHARGED RESIDUE-CLASS1-IS-HYDROPHOBIC RESIDUE-CLASS2-IS-ACIDIC RESIDUE-CLASS2-IS-NONPOLAR RESIDUE-CLASS2-IS-UNKNOWN RESIDUE-NAME-IS-ASP RESIDUE-NAME-IS-GLU RESIDUE-NAME-IS-HOH RESIDUE-NAME-IS-LEU RESIDUE-NAME-IS-VAL RING-SYSTEM SECONDARY-STRUCTURE1-IS-4-HELIX SECONDARY-STRUCTURE1-IS-BEND SECONDARY-STRUCTURE1-IS-HET SECONDARY-STRUCTURE1-IS-TURN SECONDARY-STRUCTURE2-IS-BETA SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME

Predicting Ca++ Binding Sites semantic types of attributes: e.g., Physical Chemical Structural solvent accessibility charge VDW volume heteroatom oxygen carbonyl ASN helix beta-turn ring-system mobility

Coherent Model = subset of locally acceptable rules that • explains as much of the data • uses entrenched predicates [Goodman] • uses predicates of same semantic type • uses predicates of same grain size • uses classes AND their complements • avoids rules that are "too similar": identical; subsuming; sem.close

EXAMPLE:predict Ca++ binding sites in proteins 158 rules found independently. E.g., R1: IF a site (a) is charged > 18.5 AND (b) no. of C=O > 18.75 THEN it binds calcium R2: IF a site (a) is charged > 18.5 AND (b) no. of ASN > 15 THEN it binds calcium

Predicting Ca++ Binding Sites semantic network of attributes Heteroatoms Sulfur Oxygen ... Nitrogen "Hydroxyl" Carbonyl Amide Amine | SH OH ASP GLU ASN GLN...PRO | / CYS SER THR TYR ... ...

Ca++ binding sites in proteins 58 rules above threshold: threshold = at least 80% TP AND no more than 20% FP 42 rules predict SITE 16 rules predict NON-SITE Average accuracy for five 5-fold x-validations = 100% for the redundant model with 58 rules

Predicting Ca++ Binding Sites Prefer complementary rules -- e.g., R59: IF, within 5 A of a site , # oxygens > 6.5 THEN it binds calcium R101: IF, within 5 A of a site , # oxygens <= 6.5 THEN it does NOT bind calcium o o

5 A Radius Model o Five perfect rules* R1. #Oxygen LE 6.5 --> NON-SITE R2. Hydrophobicity GT -8.429 --> NON-SITE R3. #Oxygen GT 6.5 --> SITE R4. Hydrophobicity LE -8.429 --> SITE R5. #Carbonyl GT 4.5 & #Peptide LE 10.5 --> SITE *( 100% of TP's and 0 FP's )

Final Result Ca++ binding sites in proteins Model with 5 rules: same accuracy no unique predicates no subsumed or very similar rules more genl. rules for SITES (prior prob. < 0.01) more specific rules for NON-SITES (prior prob. > 0.99)

Predicting Ca++ Binding Sites Attribute Hierarchies RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED (ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU MET PHE PRO VAL)

Attribute Hierarchies RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED ACIDIC (ARG ASP GLU) BASIC ( LYS) NONPOLAR (ALA ILE LEU MET PHE PRO VAL) HIS TRP

CONCLUSION Induction systems can be augmented with semantic criteria to provide (A) interesting & understandable rules • syntactically simple • meaningful (B) coherent models • equally predictive • closer to a theory

CONCLUSION • We have shown • how specific types of background knowledge might be incorporated in the rule discovery process • possible benefits of incorporating those types of knowledge • more coherent models • more understandable models • more accurate models

Knowledge-Based Discovery: Using Semantics in Machine Learning