560 likes | 692 Views
Towards a Knowledgeable Machine that can Pass an Elementary Science Test. Peter Clark Vulcan Inc August 2013. Outline. Halo: The Goal and Road Travelled… AURA, Inquire, and reflections Exploiting Semi-Formal Representations and Textual Inference
E N D
Towards a Knowledgeable Machine that can Pass an Elementary Science Test Peter Clark Vulcan Inc August 2013
Outline • Halo: The Goal and Road Travelled… • AURA, Inquire, and reflections • Exploiting Semi-Formal Representations and Textual Inference • A New Challenge: Fourth-Grade Science Tests
Overall Goals • Long-Term Goal: The Digital Aristotle • Have large volumes of knowledge encoded in a computable form, such that the computer can answer questions, explain its answers, and ultimately dialog with users about the subject matter “Explainable Reasoning” • History • Halo Pilot: Assess representation & reasoning technologies • Formal reasoning works, but acquisition and language are problems • Halo: Develop high-performance acquisition tool (AURA) • HaloBook (2010-12): Aim to encode much of a textbook • Inquire: An iPad app – the knowledgeable book • Halo 2.0: Reorient towards semi-automated acquisition • focus on taking K-12 science exams
The Knowledge Encoding Process …Eukaryotic cells similarly have a plasma membrane, but also contain a cell nucleus that houses the eukaryotic cell's DNA… Concept Map (User View) Logic (Internal View) ∀x isa( x, Eukaryotic-cell) → ∃p,n,disa(p, Plasma-membrane) ∧ isa(n, Nucleus) ∧ isa(d, DNA) ∧ has-part(x, p) ∧ has-part(x, n) ∧ has-part(x, d) ∧ is-inside(d, n)
Reasoning: Deductive elaboration of the graph using other graphs and commonsense rules • Parts: • Plasma membrane • Nucleus • DNA EukaryoticCell • Parts: • Plasma membrane • Cell wall • Chloroplast PlantCell • Parts: • Plasma membrane • Cell wall • Chloroplast • Nucleus • DNA Plant Cell (more)
Question Answering Typical examples of questions the system can answer: During mitosis, when does the cell plate begin to form? What happens during DNA replication? What is the relationship between photosynthesis and cellular respiration? What do ribosomes do? During synapsis, when are chromatids exchanged? What are the differences between eukaryotic cells and prokaryotic cells? How many chromosomes are in a human cell? In which phase of mitosis does the cell divide? What is the structure of a plasma membrane?
Outcomes • The good… • Experiments suggested Inquire is educationally useful • Some question classes answered well • “Suggested question” mechanism helped a lot • The bad… • Only covered ~25% of the book after 2 years • Deductive question-answering somewhat hit-and-miss
It’s not that manually constructed rulebasesare “bad”, but: • Expensive (of course, costs may be brought down) • Brittle (unless the task is very tightly constrained) • Never seem to be finished (permanently incomplete)… • Textual Inference / Semi-Formal Representations: • Create language-based representations from (lots of) text • include words/phrases – deferred ontological commitment • Imprecise, shallower reasoning • an evidential process, using multiple sources of evidence The Dilemma of Knowledge Engineering Manual methods are expensive, automatic methods are shallow
Outline • Halo: The Goal and Road Travelled… • AURA, Inquire, and reflections • Exploiting Semi-Formal Representations and Textual Inference • A New Challenge: Fourth-Grade Science Tests
Levels of Formality Semi- Formal ? Text Logic Textual entailment Logical entailment Query ?- has-part(ribosome,?x).
1. Representation Sentence "Channel proteins facilitate the passage of molecules across the membrane." *S:-17 +----------------------------------+---------+ NP:-3 VP:-13 | +----------------------------+-----+ N^:-2 V:0 *NP:-12* | | +------------+---------------+ N:-2 FACILITATE NP:-8 PP:-2 +----+----+ +-------+-------+ +-------+---+ N:-1 N:0 NP:-1 PP:-2 P:0 NP:-1 | | +----+--+ +----+--+ | +----+---+ CHANNEL PROTEINS DET:0 N^:0 P:0 NP:-1 ACROSS DET:0 N^:0 | | | | | | THE N:0 OF N^:0 THE N:0 | | | PASSAGE N:0 MEMBRANE | MOLECULES Parse across obj subj of Logical Form “passage” “channel protein” “facilitate” “membrane” “molecule” subject(facilitate-1, channel-protein-1). object(facilitate-1, passage-1). of(passage-1, molecule-1). across(passage-1, membrane-1).
2. Textual Inference • Reasoning with semi-formal structures • Find sequence of transformations from text to question • Requires general lexical and world knowledge Which proteins help move molecules through the membrane? IF X facilitates Y THEN X helps Y “passage”(n) → “move”(v) “through” ↔ “across” Knowledge resources Channel proteins facilitate the passage of molecules across the membrane. A. Channel proteins
2. Textual Inference Which proteins help move molecules through the membrane? 1. (simple) question decomposition What ?x help move molecules through the membrane? Is ?x a protein? 2a. textual entailment Channel proteins facilitate the passage of molecules across the membrane. IF X “facilitates” Y THEN X “helps” Y Channel proteins help the passage of molecules across the membrane. “passage”(n) → “move”(v), “through” ↔ “across” Channel proteins help move molecules through the membrane. What ?x help move molecules through the membrane?
2. Textual Inference Which proteins help move molecules through the membrane? 1. (simple) question decomposition What ?x help move molecules through the membrane? Is ?x a protein? Is an evidence-gathering process 2a. textual entailment Channel proteins facilitate the passage of molecules across the membrane. IF X “facilitates” Y THEN X “helps” Y Channel proteins help the passage of molecules across the membrane. “passage”(n) → “move”(v), “through” ↔ “across” Channel proteins help move molecules through the membrane. What ?x help move molecules through the membrane?
2. Textual Inference Channel proteins facilitate the passage of molecules across the membrane. Channel proteins help the passage of molecules across the membrane. What evidence can I find that “X facilitates Y” “X helps Y”? 30k rules 146k rules 12M rules 4M rules PPDB (Johns Hopkins) DIRT paraphrases BioKB-101 ontology WordNet
2. Textual Inference Channel proteins facilitate the passage of molecules across the membrane. Channel proteins help the passage of molecules across the membrane. What evidence can I find that “X facilitates Y” “X helps Y”? 30k rules 146k rules 12M rules 4M rules PPDB (Johns Hopkins) DIRT paraphrases BioKB-101 ontology WordNet
2. Textual Inference Channel proteins facilitate the passage of molecules across the membrane. Channel proteins help the passage of molecules across the membrane. What evidence can I find that “X facilitates Y” “X helps Y”? 30k rules 146k rules 12M rules 4M rules PPDB (Johns Hopkins) DIRT paraphrases BioKB-101 ontology WordNet
2. Textual Inference Channel proteins facilitate the passage of molecules across the membrane. Channel proteins help the passage of molecules across the membrane. What evidence can I find that “X facilitates Y” “X helps Y”? 30k rules 146k rules 12M rules 4M rules PPDB (Johns Hopkins) DIRT paraphrases BioKB-101 ontology WordNet
Domain-Biased Paraphrases (Johns Hopkins) • Paraphrases learned via bilingual pivoting, and rescored using distributional similarity.
Some examples from PPDB travel fly 0.893 travel roll over 0.882 travel relax 0.87 travel freeze 0.861 travel breathe 0.861 travel swim 0.858 travel move 0.855 travel die 0.848 travel swell 0.845 travel switch 0.842 travel consumers 0.838 travel bend 0.835 travel walk 0.835 travel paint 0.828 travel work 0.828 travel move over 0.825 travel feed 0.825 travel evolve 0.825 travel survive 0.821 … … … amplify elevate 0.993 amplify explore 0.992 amplify enhance 0.984 amplify speed up 0.984 amplify strengthen 0.982 amplify improve 0.982 amplify magnify 0.98 amplify extend 0.978 amplify accept 0.97 amplify follow 0.965 amplify carry out 0.965 amplify broaden 0.962 amplify go into 0.962 amplify promote 0.959 amplify explain 0.955 amplify implement 0.951 amplify leave 0.944 amplify adopt 0.944 amplify acquire 0.942 amplify expand 0.942 … … … ??? ???
Performance • Currently, 3 databases of semi-formal representations • Current F1 ≈ 30% (e.g., 50% on 10% of qns) • Answer = weighted sum of evidence • Learn the weights (via simulated annealing)
Levels of Formality Semi- Formal ? Text Logic Query ?- has-part(ribosome,?x).
Levels of Formality Semi- Formal ? Text Logic What should go in here? Query ?- has-part(ribosome,?x).
Outline • Halo: The Goal and Road Travelled… • AURA, Inquire, and reflections • Exploiting Semi-Formal Representations and Textual Inference • A New Challenge: Fourth-Grade Science Tests
K-12 Grade Science Tests • Provide a (task-oriented) focus • Simpler (question) language • Involves more common sense • Wide variety of question types and difficulties • Caveats • Multiple choice are common • Diagrams are common
The 4th Grade NY Regents’ Science Exam • What types of questions are there? • What would it take to answer them?
The 4th Grade NY Regents’ Science Exam • What types of questions are there? • What would it take to answer them?
The 4th Grade NY Regents’ Science Exam • What types of questions are there? • What would it take to answer them? “Retrieval”
1. Taxonomic • Question interpretation: • Decompose question into “isa” queries • Several good sources of simple “isa” knowledge • WordNet, Cyc, Wikipedia • Within text itself • “isa” knowledge is fundamental to other reasoning types
2. Definitions Dictionary Resources erosion: The process of being eroded by wind, water, or other natural agents. erosion: The wearing away of rocks and other deposits on the earth's surface … erosion: The gradual wearing away of land surface materials, especially rocks, …
2. Definitions Dictionary Resources erosion: The process of being eroded by wind, water, or other natural agents. erosion: The wearing away of rocks and other deposits on the earth's surface … erosion: The gradual wearing away of land surface materials, especially rocks, … Entailment-Style Reasoning the movement of soil by wind or water The gradual wearing away of land surface materials, especially rocks, sediments, and soils, by the action of water, wind, or a glacier.
3. Basic Facts “Semantic Databases” • Some basic facts can be pre-extracted and cleaned • parts, functions, steps in a process, etc. • + existing resources have some of this knowledge
Building Semantic Databases… Text has-part(Leaf,Stomata) Known parts “Stomata in a leaf's surface lead to a maze of internal air spaces” good “parts” relations (training data) Sentences expressing those relations LOD WordNet AURA MultiR (Univ Washington) Final parts database candidate pair, e.g., “plant cell” has-part “chloroplast”? Decision (yes/no + confidence) Classifier Iterate, + Human/ machine validation
The 4th Grade NY Regents’ Science Exam • What types of questions are there? • What would it take to answer them? “Inference”
4. “Rules” (simple inference) • Many questions require simple, one-step entailments • X eats → X gets nutrients • X breathes oxygen –enables→ X make energy • X made of metal → X conducts electricity • Large number of such facts and rules needed • Manually enter them? • Induce them? • Just read them? Via: Judicious forms of text Good NLP Manual validation
4. Knowledge (Rule) Extraction from Text Animals take in air by breathing. They need oxygen, which is in the air. Oxygen allows the animal to make and use energy, which it needs to survive. Animals also need water to survive. Water is used to break down and move materials throughout the body. Animals cannot make their own food so they must eat to get nutrients. Nutrients are necessary for growth and energy. • Assertions • air contains oxygen • animals need oxygen • animals need energy • animals need water • Implications • animal breathes → animal takes in air • animal breathes oxygen -enables→ animal make energy • animal eat -enables→ animal get nutrients • animal get nutrients -enables→ animal grow • animal has water -enables→ animal breakdown materials
4. Knowledge (Rule) Extraction from Text • Rule acquisition: • specific patterns in text X Ys by Z IF X ZsTHEN X Ys “Animals take in air by breathing.” IF an animal breathes THEN an animal takes in air • Rule application: using textual entailment-style inference • If rule condition entailed, then infer conclusion • Current status: Pretty noisy rules!
The 4th Grade NY Regents’ Science Exam • What types of questions are there? • What would it take to answer them? “Models”
5. Domain Models • Sometimes you do need some “computational clockwork” • Qualitative models • qualitative influences (X goes up → Y goes down) • what happens to Z if X goes up? • Process models • partially ordered network of events • how does X contribute to Y? • Acquisition Task ≠ “read the text” = extract/build model instances from the text
5. Example: Process Models Process reasoner: Given a process, can answer questions, e.g. What is the role of Entity in Process? What Entity performs Role in Event? During X, what happens after Y? KA Task = extract a process instance from text: 1. Identify where a process is being described 2. Extract it, e.g., with a set of trained classifiers “stimulate” [theme: “cell”] When the cell is stimulated, gated channels open that facilitate Na+ diffusion. Sodium ions then diffuse down their electrochemical gradient…. “open” [theme: “gated channels”] “diffuse” [theme: “sodium ions”,”Na+” direction: “down ec gradient”]
Extracting Process Models gradient flow down H+ ions rotor binding site enter H+ ions flowing down their gradient enter a half channel in a stator, which is anchored in the membrane. H+ ions enter binding sites within a rotor, changing the shape of each subunit so that the rotor spins within the membrane... Spinning of the rotor causes an internal rod to spin as well. This rod extends like a stalk into the knob below it, which is held stationary by part of the stator. Turning of the rod activates catalytic sites in the knob that can produce ATP from ADP and Pi. shape change causes rotor, rod spin catalytic site activate ATP produce ADP, Pi
Another Example: Energy Conversion • Modeling technique: Energy conversion • extract event sequence (process model) • layer energy types on top • → initial form of energy? final? form that produced X? etc baby shake rattle rattle make noise movement sound mechanical energy sound energy
The 4th Grade NY Regents’ Science Exam • What types of questions are there? • What would it take to answer them? “Diagrams”