1 / 31

Knowledge-Based Discovery: Using Semantics in Machine Learning

Knowledge-Based Discovery: Using Semantics in Machine Learning. Bruce Buchanan Joe Phillips University of Pittsburgh. buchanan @ cs.pitt.edu josephp @ cs.pitt.edu. Intelligent Systems Laboratory. Faculty: Bruce Buchanan, P.I., John Aronis

kairos
Download Presentation

Knowledge-Based Discovery: Using Semantics in Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge-Based Discovery:Using Semantics in Machine Learning Bruce Buchanan Joe Phillips University of Pittsburgh buchanan @ cs.pitt.edu josephp @ cs.pitt.edu

  2. Intelligent Systems Laboratory • Faculty: Bruce Buchanan, P.I., John Aronis • Collaborators: John Rosenberg (Biol.Sci.), Greg Cooper (Medicine), Bob Ferrell (Genetics), Janyce Wiebe (CS), Lou Penrod (Rehab.Med.), Rich Simpson (Rehab.Sci.), Russ Altman (Stanford MIS) • Research Associates: Joe Phillips, Paul Hodor, Vanathi Gopalakrishnan, Wendy Chapman • Ph.D. Students: Gary Livingston, Dan Hennessy, Venkat Kolluri, Will Bridewell, Lili Ma • M.S. Students: Karl Gossett

  3. GOALS • (A) Learn understandable & • interesting rules from data • (B) Construct an understandable & • coherent model from rules • METHOD: • To use background knowledge to search for: • simple rules with familiar predicates • interesting and novel rules • coherent models

  4. Familiar Syntax (conditional rules) Syntactically Simple Semantically Simple Familiar Predicates Accurate Predictions Meaningful Rules Relevant to Question Novel Cost-Effective Coherent Model Rules or Models:Understandable | Interesting

  5. The RL Program Explicit Bias Partial Domain Model New Cases Training Examples Performance Program MODEL RL Predictions HAMB RULES

  6. (A) Individual Rules • J. Phillips • Rehabilitation Medicine Data

  7. Simple single rules • Syntactic Simplicity • Fewer terms on the LHS • Explicitly stated constraints (rules with no more than N terms) • Tagged attributes (e.g. must have at least one control attribute to be interesting)

  8. Simple sets of rules • Syntactic simplicity • Fewer rules: • independent rules • E.g. in physics: • U(x) = Ugravity(x) + Uelectronic(x) + Umagnetic(x) • HAMB removes highly similar terms from feature set • less independence when there’s feedback • e.g. medicine

  9. Interestingness: • Given, controlled and observed • explicitly state observed attributes as interesting target • Temporal • future (or distant past) predictions are interesting • Influence diagram (e.g. Bayes net) • strong but more indirect influences are interesting

  10. Using typed attribute background knowledge • Organize terms into “given”, “controlled” and “observed” • E.g. in medical domain “demographics”, “intervention” and “outcome” • Benefits: • Categorization of rules by whether they use givens (default), controls (controllable) or both (conditionally controllable):

  11. Typed attribute example • Rehab. (RL; Phillips, Buchanan, Penrod) • > 2000 records observed given controlled temporal medical demographic medical admitgeneral_condition timerate ageracesex specific_condition absolutenormalize

  12. Example interestingness: • Group rules by whether they predict by medical, demographic or both: • by medical: • Left_Body_Stroke => poor improvement (interesting, expected) • by demographic: • High_age => poor improvement (interesting, expected) • (Race=X) => poor improvement (interesting, NOT expected)

  13. Using temporal background knowledge • Organize data by time • Utility may or may not extend to other metric spaces (e.g. space, mass) • Benefits: • Predictions parameterized by time: f(t) • Future or distant past may be interesting • Cyclical patterns

  14. Temporal example • Geophysics (Scienceomatic; Phillips 2000) • Subduction zone discoveries of type: d(qafter) = d(qmain) + m*[t(qafter)-t(qmain)] + b • NOTE: This is not an accurate prediction! • interesting, generally quakes can’t be predicted X d

  15. Using influence diagram background knowledge • This is future work! • Organize terms to follow pre-existing influence diagram • E.g. Bayesian nets, but do not need conditional probabilities • Benefits: • Suggest hidden variables, new influences • f(x) => f’(x,y)

  16. Interestingness summary • How different types of background knowledge help us achieve interestingness: • Explicitly stated: “observed” attributes • Implicitly stated: parameterized equations with “interesting” parameters • Learned: “new” influence factors

  17. (B) Coherent Models • B.Buchanan • Protein Data

  18. EXAMPLE:Predicting Ca++ Binding Sites (G.Livingston) Given 3-d descriptions of 16 sites in proteins that bind calcium ions & 100 other sites that do not Find a model that allows predicting whether a proposed new site will bind Ca++ [in terms of subset of 63 attributes]

  19. Ca++ binding sites in proteins SOME ATTRIBUTES ATOM-NAME-IS-C ATOM-NAME-IS-O CHARGE CHARGE-WITH-HIS HYDROPHOBICITY MOBILITY RESIDUE-CLASS1-IS-CHARGED RESIDUE-CLASS1-IS-HYDROPHOBIC RESIDUE-CLASS2-IS-ACIDIC RESIDUE-CLASS2-IS-NONPOLAR RESIDUE-CLASS2-IS-UNKNOWN RESIDUE-NAME-IS-ASP RESIDUE-NAME-IS-GLU RESIDUE-NAME-IS-HOH RESIDUE-NAME-IS-LEU RESIDUE-NAME-IS-VAL RING-SYSTEM SECONDARY-STRUCTURE1-IS-4-HELIX SECONDARY-STRUCTURE1-IS-BEND SECONDARY-STRUCTURE1-IS-HET SECONDARY-STRUCTURE1-IS-TURN SECONDARY-STRUCTURE2-IS-BETA SECONDARY-STRUCTURE2-IS-HET VDW-VOLUME

  20. Predicting Ca++ Binding Sites semantic types of attributes: e.g., Physical Chemical Structural solvent accessibility charge VDW volume heteroatom oxygen carbonyl ASN helix beta-turn ring-system mobility

  21. Coherent Model = subset of locally acceptable rules that • explains as much of the data • uses entrenched predicates [Goodman] • uses predicates of same semantic type • uses predicates of same grain size • uses classes AND their complements • avoids rules that are "too similar": identical; subsuming; sem.close

  22. EXAMPLE:predict Ca++ binding sites in proteins 158 rules found independently. E.g., R1: IF a site (a) is charged > 18.5 AND (b) no. of C=O > 18.75 THEN it binds calcium R2: IF a site (a) is charged > 18.5 AND (b) no. of ASN > 15 THEN it binds calcium

  23. Predicting Ca++ Binding Sites semantic network of attributes Heteroatoms Sulfur Oxygen ... Nitrogen "Hydroxyl" Carbonyl Amide Amine | SH OH ASP GLU ASN GLN...PRO | / CYS SER THR TYR ... ...

  24. Ca++ binding sites in proteins 58 rules above threshold: threshold = at least 80% TP AND no more than 20% FP 42 rules predict SITE 16 rules predict NON-SITE Average accuracy for five 5-fold x-validations = 100% for the redundant model with 58 rules

  25. Predicting Ca++ Binding Sites Prefer complementary rules -- e.g., R59: IF, within 5 A of a site , # oxygens > 6.5 THEN it binds calcium R101: IF, within 5 A of a site , # oxygens <= 6.5 THEN it does NOT bind calcium o o

  26. 5 A Radius Model o Five perfect rules* R1. #Oxygen LE 6.5 --> NON-SITE R2. Hydrophobicity GT -8.429 --> NON-SITE R3. #Oxygen GT 6.5 --> SITE R4. Hydrophobicity LE -8.429 --> SITE R5. #Carbonyl GT 4.5 & #Peptide LE 10.5 --> SITE *( 100% of TP's and 0 FP's )

  27. Final Result Ca++ binding sites in proteins Model with 5 rules: same accuracy no unique predicates no subsumed or very similar rules more genl. rules for SITES (prior prob. < 0.01) more specific rules for NON-SITES (prior prob. > 0.99)

  28. Predicting Ca++ Binding Sites Attribute Hierarchies RESIDUE CLASS 1 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED (ARG ASP GLU LYS) HYDROPHOBIC (ALA ILE LEU MET PHE PRO VAL)

  29. Attribute Hierarchies RESIDUE CLASS 2 POLAR (ASN, CYS, GLN, HIS, SER THR, TYR, TRP, GLY) CHARGED ACIDIC (ARG ASP GLU) BASIC ( LYS) NONPOLAR (ALA ILE LEU MET PHE PRO VAL) HIS TRP

  30. CONCLUSION Induction systems can be augmented with semantic criteria to provide (A) interesting & understandable rules • syntactically simple • meaningful (B) coherent models • equally predictive • closer to a theory

  31. CONCLUSION • We have shown • how specific types of background knowledge might be incorporated in the rule discovery process • possible benefits of incorporating those types of knowledge • more coherent models • more understandable models • more accurate models

More Related