220 likes | 383 Views
The Reading to Learn Project. Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI). SRI-Boeing’s Reading to Learn Seedling. Goal:
E N D
The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)
SRI-Boeing’s Reading to Learn Seedling • Goal: • study issues in machine reading by working with a reduced version of the problem, namely working with controlled, rather than unrestricted natural language. The NLP task is factored into two: • Rationale: • by sidestepping some of the grammatical issues of full NLP, can focus on issues of understanding and reasoning • methods for full NL → CL can be studied separately this project full NL → CL, CL → logic
SRI-Boeing’s Reading to Learn Seedling • Approach: • Rewrite 5 pages of chemistry text into our controlled language, CPL • Extend and use our CPL interpreter to generate logic • Integrate this new knowledge with an existing chemistry knowledge base (from the Halo Pilot) • Report on the problems encountered and solutions developed
CPL: Computer-Processable Language • Simplified version of English. Basic sentence: subj + verb + complements + adjuncts • Set of guidelines (present tense, no adverbs, no subordinate clauses, etc.) • Translates into first-order logic axioms • Used in several projects in Boeing
Declarative CPL rules “The Knowledge Gap” Literal/messy logic representation Real(istic) Controlled Language Text Two Paths from Language to Logic… Inference- supporting Representation Real Text
Inference- supporting Representation Declarative CPL rules “The Knowledge Gap” Literal/messy logic representation Real Text Real(istic) Controlled Language Text Two Paths from Language to Logic:Approach I: CPL as simplified English
Approach I: Reformulation of the 5 pages… • Note: introductory material, flowery language, fluff, complex sentences, parentheticals.
CPL Reformulation: Acids have a sour taste. Acids cause some dyes to change color. Bases have a bitter taste. Bases have a slippery feel.
CPL Reformulation: Hydrogen chloride is an Arrhenius acid. Hydrogen chloride gas is soluble in water. Hydrogen chloride gas in water reacts with the water. The reaction produces H-plus ions and Cl-minus ions. HCl is hydrogen chloride. Hydrochloric acid is an aqueous solution of HCl. 37 percent of the mass of concentrated hydrochloric acid is HCl. The concentration of HCl in concentrated hydrochloric acid is 12 M.
Hydrogen chloride gas in water reacts with the water. FORALL ?Water0, ?Hydrogen Chloride0, ?Gas0: isa(?Water0,H2O) isa(?Gas0, Gas-Substance) isa(?HydrogenChloride0, HCl) has-basic-unit(?Gas0, ?HydrogenChloride0) is-inside(?Gas0, ?Water0) ===> EXISTS ?React0: isa(?React0, Reaction) raw-material(?React0, ?Gas0) raw-material(?React0, ?Water0)
Do We Get Inference-Capable Knowledge Out? Encoded and processed ~250 CPL sentences. Limited inference ability. Several key problems: • Complex Notions/Idioms/Special Phrases: "The reaction favors transfer of protons to the weaker acid" • Examples: "An HCl molecule in water donates a proton to an H2O molecule." • Diagrams and Tables:
Do We Get Inference-Capable Knowledge Out? • Generics: "Acids contain hydrogen." "Acids cause some dyes to change color." • Procedural/Problem-solving knowledge: "A conjugate base is formed by removing a proton from the acid.“ ( Base = Acid – Proton ) • Representational challenges: "An acid and a base differing only in a proton are called a conjugate acid-base-pair." • Algebra: "NaOH dissociates into Na+ and OH- ions." • Metonymy: "Nitrous acid reacts with water in Equation 16.7."
Inference- supporting Representation Declarative CPL rules “The Knowledge Gap” Literal/messy logic representation Real Text Real(istic) CPL Text Two Paths from Language to Logic:Approach II: CPL as a rule language
Rewriting a Sentence into CPL Textbook “In every acid-base reaction the position of the equilibrium favors transfer of the proton to the stronger base.” More Precise Encoding 1 IF there is a reaction AND one base in the reaction is stronger than the other base in the reaction THEN the direction of the reaction is away from the stronger base. [“favors transfer to” → “direction is away from”] More Precise Encoding 2 IF there is a reaction AND there is a base on the left side of the reaction AND there is a base on the right side of the reaction AND the first base is stronger than the second base THEN the direction of the reaction is to the right.
Can we now integrate this with the KB? • Goal: • Manually remove the target knowledge from KB • Add the new knowledge in • However: • Hard to remove and add knowledge • KB is complex and intertwined • 2 case studies: • Conjugate acid calculations • Relative acid strengths
Text Book: Conjugate Acid • Computing the conjugate acid: • i.e. Acid = Base + proton • e.g. H3O+ = H2O + H+ “Every base has associated with it a conjugate acid, formed by adding a proton to the base.”
KB: Conjugate Acid Compute-Conjugate-Acid: input: Chemical parent?formula: Chemical (input) → Molecule → FormulaObject → Formula target-unit: LISP: Formula (parent?formula) → Formula (conjugate) output: Formula (target-unit) → FormulaObject → Molecule → ClassifiedMolecule → Chemical → ClassifiedChemical “H2O” 2H+O 2H+O 3H+O 3H+O “H3O” (result)
Text Book: Relative Strengths of Acids • The textbook presents a rank-order of relative strengths. • Encoded as large nested structure in KB
KB: Relative Strengths of Acids (every Acid-Role has (intensity ( (a Intensity-Value with (value ( (:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or ((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else (if (((the played-by of Self) isa H3PO4-Substance) or ((the played-by of Self) isa HF-Substance) or ((the played-by of Self) isa HC2H3O2-Substance) or ((the played-by of Self) isa H2CO3-Substance) or
Requirements for an Extensible KB • Simple structures • Declarative and procedural knowledge separated • include a constraint reasoner • Elaboration-tolerant organization • Error-tolerant reasoner
What Have We Learned? • Controlled language only partially alleviates the problem • many interpretation problems remain • Text often doesn't clearly state knowledge • need to use multiple texts • AP Chemistry is particularly hard • chemical/molecule/formula distinction; algebra • Knowledge bases need to be built for extensibility • syntactically simple, declarative, elaboration tolerant • Bridging the gap: • "Smart" language interpretation; use of knowledge • Where might this come from? bootstrapping
Another View of the Path from Language to Logic… Inference- supporting Representation Expectations Refinements Context Confirmations Prior knowledge Examples Hypotheses Fragments of new knowledge Real Text