180 likes | 244 Views
OntoSoar: Feeding a Growing Ontology. CS 652 Information Extraction and Integration Fall 2012 Peter Lindes. Project Goals. Use linguistic technologies to: Find more facts Learn new categories and relations Technologies to be used: OntoES Link Grammar Parser LG-Soar Soar
E N D
OntoSoar: Feeding a Growing Ontology CS 652 Information Extraction and Integration Fall 2012 Peter Lindes OntoSoar
Project Goals • Use linguistic technologies to: • Find more facts • Learn new categories and relations • Technologies to be used: • OntoES • Link Grammar Parser • LG-Soar • Soar • Discourse Representation Theory OntoSoar
What is OntoES? • A system for building OSM models • Capable of representing extraction ontologies • Processes text to extract facts OntoSoar
What is Soar? • A cognitive architecture • A system that implements that architecture • Major elements: • Short- and long-term memories • Decision procedure • Perception and action modules • Various kinds of learning • Example applications: • TacAirSoar • BOLT Project OntoSoar
OntoSoar OntoSoar
Raw OCR’d Text Example OntoSoar
Segmented Text 243314. Charles Christopher Lathrop, N. Y. City, b. 1817, d. 1865, son of Mary Ely and Gerard Lathrop ; m. 1856, Mary Augusta Andruss, 992 Broad St., Newark, N. J., who was b. 1825, dau. of Judge Caleb Halstead Andruss and Emma Sutherland Goble. Mrs. Lathrop died at her home, 992 Broad St., Newark, N. J., Friday morning, Nov. 4, 1898. The funeral services were held at her residence on Monday, Nov. 7, 1898, at half- past two o'clock P. M. Their children: 1. Charles Halstead, b. 1857, d. 1861. 2. William Gerard, b. 1858, d. 1861. 3. Theodore Andruss, b. i860. 4. Emma Goble, b. 1862. OntoSoar
Parsing and Semantics 1. Charles Halstead, b. 1857, d. 1861. Charles Halstead was born in 1857 and died in 1861. +-------------------------------------Xp-------------------------------------+ | +---------------Ss---------------+ | +---------Wd--------+ +----------VJlsi---------+------MVp------+ | | +----G----+ +---Pa--+-MVp-+-IN-+ +--VJrsi-+ +-IN-+ | | | | | | | | | | | | | LEFT-WALL Charles.b Halstead was.v-d born.ain.r 1857 and.j-v died.v-d in.r 1861 . Extracted facts: Predicates: Person(P1) Person_Name(P1, "Charles Halstead") Person_BirthDate(P1, "1857") Person_DeathDate(P1, "1861") person(P1) named(P1, "Charles Halstead") born(P1, "1857") died(P1, "1861") OntoSoar
More Complex Parsing Charles Christopher Lathrop, N. Y. City, was born in 1817 and died in 1865 and was the son of Mary Ely and Gerard Lathrop ; +-----------------------Ss------------------- +------MXs-----+ | +----Xd---+ +----------VJlsi------ +-----G-----+-----G----+ | +-G+-G-+Xc+ +---Pv--+-MVp-+-IN-+ | | | | | | | | | | | | Charles.bChristopher.b Lathrop , N. Y. City , was.v-d born.vin.r 1817 ---+ +-----------VJrsi----------+ ---+ +------VJlsi------+ +----Ost---+ +--------Ju-------+---- | +--MVp-+-IN-+ +-VJrsi-+ +-Ds-+-Mp-+ +--G--+-SJls-+ | | | | | | | | | | | | and.j-v died.v-d in.r 1865 and.j-v was.v-d the son.n of Mary.bEly.mand.j-n --SJrs------+ +---G---+ | | Gerard.m Lathrop [;] OntoSoar
More Complex Semantics Person(P2) Person(P3) Person(P4) Person_Name(P2, "Charles Christopher Lathrop") Person_Name(P3, "Mary Ely") Person_Name(P4, "Gerald Lathrop") Person_BirthDate(P2, "1817") Person_DeathDate(P2, "1865") Parent_has_Child(P3, P2) Parent_has_Child(P4, P2) Male(P2) Parent_with_Parent(P3, P4) GeoEntity(GE1) GeoEntity_Name(GE1, "N. Y. City") Person_livedIn_GeoEntity(P2, GE1) person(P2) named(P2, "Charles Christopher Lathrop") place(GE1) named(GE1, "N. Y. City") livedIn(P2, GE1) born(P2, "1817") died(P2, "1865") person(P3) named(P3, "Mary Ely") son(P2, P3) person(P4) named(P4, "Gerald Lathrop") son(P2, P4) couple(P3, P4) OntoSoar
More Learning He graduated B. A. from Rensselaer Polytechnic College, Troy, N. Y. +--------MXs--------+ +-----Os----+ +-------------Js-------------+---MXs---+ +--Xd-+ +---Ss--+ +-G+-Mp+ +-----G----+----G----+ +-Xd-+Xca+ +-G+ | | | | | | | | | | | | | he graduated.v-d B. A. from Rensselaer Polytechnic College , Troy.b , N. Y. Person(P5) GeoEntity(GE2) Person_Name(P5, "Gardner Bullard") GeoEntityName(GE2, "Troy, N. Y.") Male(P5) Institution(I1) Institution_Name(I1, "Rensselaer Polytechnic College") Person_graduatedFrom_Institution(P5, I1) Institution_locatedIn_GeoEntity(I1, GE2) pro3SingMasc(X1) institution(I1) named(I1, "Rensselaer Polytechnic College") graduatedFrom(X1, I1) person(X1) place(GE2) named(GE2, "Troy, N. Y.") locatedIn(I1, GE2) person(P5) named(P5, "Gardner Bullard") sameAs(X1, P5) OntoSoar
Processing Steps • Gather section of text • Segment into sentence fragments • Parse with the LG-Parser • Build predicates with LG-Soar • Resolve pronouns using DRT • Convert predicates to facts • Match extracted facts against conceptual model • Record facts that match • Learn from partial matches OntoSoar