330 likes | 346 Views
A New Data Structure for Processing Natural Language Database Queries. Richard Frost (Professor Emeritus) and Shane Peelar(Doctoral student) Funded by NSERC of Canada, School of Computer Science, University of Windsor , Ontario Canada. The Web & the SEMANTIC WEB- advantages.
E N D
ANew Data Structure for Processing Natural Language Database Queries Richard Frost (Professor Emeritus) and Shane Peelar(Doctoral student) Funded by NSERC of Canada, School of Computer Science, University of Windsor, Ontario Canada WEBIST CONFERENCE , September 2019 Technical University of Vienna
The Web & the SEMANTIC WEB- advantages Billions of facts – DATABASES & TRIPLESTORES. Various formats: Some facilitate addition and processing of facts eg: relational databases and Semantic web triplestores. However, SQL and SPARQL require training to use.
moon orbits planet Example database relations moon discoverer date QUESTIONS Who discovered two moons that orbit a planet Who discovered every moon that orbits a planet Who discovered every moon that orbits mars
An Alternative data structure, FDBRs - functions defined in terms of binary relations discoverer moon Planet orbited by moon Note that for a relation with n columns, there are n2 – n functions: Discover - > moon Moon -> discoverer Discoverer - > date Date -> discoverer Moon -> date date -> moon Moon -> planet planet -> moon QUESTIONS Who discovered nix Who discovered two moons that orbit a planet Who discovered every moon that orbits a planet Who discovered every moon that orbits mars
NATURAL LANGUAGE QUERY INTERFACES Two approaches: Translate to SQL OR SPARQL Interpret queries directly using a compositional semantics – The approach that we adopt
Abstract (RICHARD Montague 1970): I reject the contention that an important theoretical difference exists between formal and natural languages. MS :Direct Evaluation of NL Queries w.r.t. Datastore “Who (stole (a car)[(in(1918 or 1920), in(a (borough (of New_York)))])?” ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ ⇩ λ…( λ…( λ… λ…)) [(λ…( λ…λ…), ( λ…), λ…(λ…(λ…( λ… )))]) ⇧ ⇧ ⇧ ⇧ DATASTORE λ… are functions (which are the denotations of English words) Some functions are defined ⇧in terms of datastoreretrieval operations: NLI WoD 4 2018
LAMBDA CALCULUS • Universal model of computation. Take an expression such as x + y • λ x . λy . (x + y) IS THE NAME AND DEFINITION OF A FUNCTION • add = λx . λy . (x + y) -- this is an example of lambda abstraction • add 3 4 = λx . λy . (x + y) 3 4 • =>λy . (3 + y) 4 --- this is an ex. of lambda reduction • => 3 + 4 then another reduction • add _three = add 3 => λy . (3 + y) partial application of add returns a • function • add_three6 => 9 add 3 6 => 9
MS Ontology Entities - e Truth values- b Functions: e.g. moon :: e -> b moon is a characteristic function of a set spins :: e -> b io :: (e ->b ) -> b every :: (e->b) ->(e ->b) -> b || moon|| = λe True, if e =“io” or “miranda” or “deimos” or “phobos” etc. || spins|| = λe True, if e =“earth” or “mars”” or “jupiter”or “venus” or “io” or “miranda” or deimos” or “phobos”, etc. ||every|| = λs . λt . ∀ x sx -> t x ||every moon spins|| => ||every|| ||moon|| ||spins|| => λs . λt . ∀ x s x -> t x ||moon|| ||spins|| => ∀ x moon x -> spins x => True or False
Example queries mars spins which moons orbit mars who discovered phobos and deimos who discovered every moon that orbits a planet who discovered every moon that orbits mars hall discovered a moon with a telescope in 1877 - where did hall discover two moons - how did hall discover a moon in 1877 - how many moons orbit jupiter or mercury hall discovered phobos with a telescope that was used to discover deimos -
Montague Semantics ||spin|| = a characteristic function ||phobos|| = λ p . (p ephobos) “phobos spins” => ||phobos|| ||spins|| =>λp . (p ephobos)spin_pred => spin_pred ephobos=> True if phobos spins || every || = λs . λ t . (∀es e -> t e) ||every moon spins|| = ||every moon|| ||spins|| note tidal sync. =>(λt . (∀ e (moon_pred e -> t e)) meaning of “every moon” ||spins|| =>(∀ e moon_pred e-> spin_pred e) costly Note: type of ||phobos|| = type of ||every moon||
Advantages of compositional semantics & MS: • In MS, all words/phrases of the same syntactic category have denotations • of the same semantic type…. Highly modular • Extensibility … new constructs of a syntactic category c must have denotations of the same type as existing constructs of category c. • Small set of rules defines an infinite language • Facilitates construction of nli(especiallyas an eag) and proofs
Montague’s treatment of transitive verbs: syntactic manipulation Tvs are left uninterpreted until the rest of the phrase has been interpreted and reduced as far as possible. The whole expression is then rewritten to another lambda expression using a syntactic sigma σrule. See page 216 of Dowty, Wall and Peters 1981. - Complicated and difficult to implement - No explicit denotation for tvs. - Therefore no denotation, for example, for “discovered phobos”
OUR APPROACH • MODIFY MS TO USE SETS RATHER THAN CHARACTERISTIC FUNCTIONS of sets • moon = { ephobos, edeimos, e io , echaron, etc } – see next few pages • ADD EVENTS TO MS IN DATA STORAGE – see next 3 pages • CREATE Functions Defined in terms of Binary Relations (FDBRs) to give an explicit denotations for TRANS. VERBS
Our approach I: mod MS for efficiency: set based ||spin|| = spin_set …set of entities that spin { ||phobos|| = λ s . (ephobos ∈ s) “phobos spins” => ||phobos|| ||spins|| =>λ s . (ephobos ∈ s) spin_set => ephobos ∈ spin_set => True …if phobos spins || every || = λs . λ t . (s ⊂ t) ||every moon spins|| = ||every moon|| ||spins|| =>(λt . (moon ⊂t )) - meaning of “every moon” =>(moon_set ⊂ spin_set) less costly to calculate Note: type of ||phobos|| = type of ||every moon||
7 Blackburn and Bos approach to tvs ||discover|| = λz z(λx λy discover_pred(y, x)) Works for 2-place trans verbsbutnotn-ary(see Frost 2006 ACM Surveys)
Our approach – add EVENTS to MS :Each event can have several properties Event Property Entity • event1045 subject hall • event1045 object phobos • event1045 type discover ev • event1045 year 1877 • event1045 location us naval observatory { additional • event1045 implement refractor telescope 1 facts}
Our approach II: Use of FDBRs what is the mathematical name? Set of images? Album? (Taylor)Set of umbra? • discover_rel subj->obj={(ehall, edeimos ), • (ehall, ephobos ), • (ekuiper, emiranda), • (ekuiper, enereid), • etc. This is a relation • fdbr discover_rel subj->obj={(ehall, {edeimos, ephobos}), • (ekuiper, {emiranda, enereid}, • etc. • is a function
Our approach III: Explicit denotation for transitive verbs, e.g. “discover phobos” • fdbrdiscover_relsubj->obj={(ehall, {edeimos, ephobos}), • (ekuiper, {emiranda, enereid}, • Then ||tv|| ||tmp|| applies ||tmp|| to each y in every pair (x,y) • ∈FDBR subj->obj and if True adds x to the answer. • Example: “discover phobos” • => {(x) | (x,y)∈FDBRsubj->obj & ||phobos||y} • Final answer is set of all x returned (with events) • Another example “discovered every moon that orbits mars” • => {ehall }
N-ary events and n-ary relations definen2 – n FDBR functions e.g. discovery event: (a row in an n-ary relation) <event1030> <subject> <hall> . <event1030> <object> <phobos> . <event1030> <date> <1877> . <event1030> <implement> <refractor_telescope_1> . <event1030> <location> <us_naval_observatory>. Equiv. 5-ary relation has 5 columns: subject, object, year, implement, location with 20 binary-relations and 20 FDBRs between columns (excluding col to itself). These FDBRs can be used to process “wh..” queries. These FDBRs can also be used to define denotations of tvs with prepositional phrases & no subject or object. e.g.FDBR discover_rel implement->object = {(erefractor_telescope_1, {ephobos, edeimos}), etc Can be used to answer queries such as “who discovered phobos and deimos with a telescope”
DEMONSTRATION The Web Page The list of event-based triples The natural language interface Example queries with nouns, pnouns, intransitive verbs, adjectives, conjunction and disjunction Example queries with quantifiers Example queries with transitive verbs Example queries with prepositions
Passive tvs, and queries with simple PPs • “discovered by hall” • Create appropriate FDBR, e.g. for passive voice • use fdbr Discover_rel obj->subj ={(edeimos, {ehall}), • (ephobos, {ehall}), • (emiranda, {ekuiper, e}), • (enereidr, {ekuiper}), etc. • then process as with active voice • “was discovered with a telescope” • use fdbr Discover_rel obj->implement ={(edeimos, {erefractor_telescope_1}), • (ephobos, {erefractor_telescope_1}), • etc. • then process as with active voice • The Web Page
Chained prepositional phrases • change FDBRx->y to include the event_ids which relate x to y values. • ||tv||changed to take a possibly empty list of PPs after the possibly empty termphrase • Each PP filters the previous FDBR to select only those pairs that satisfy the PP • E.g.……. “in 1877” checks for events which are related by “….date” to “…1877” and the tv passes the filtered FDBR to be processed by the next PP. • Details in the paper.
Scopinga planet is orbited by two moons that were discovered by Kuiper - Falsekuiper discovered two moons that orbit a planet - Truea planet is orbited by two moons that were discovered by hall - Truehall discovered two moons that orbit a planet - Trueprepositional scoping:every moon that orbits Neptune was discovered with a telescope or voyager_2 – Truea telescope or voyager_2 was used to discover every moon that orbits Neptune - False
Superlatives • “who discovered the largest moon” • several meanings, some depending on comparison set • Absolute superlative: “who discovered ganymede” • Comparative superlative: “who, of the discoverers in question discovered the largest moon” • fdbr discover_relsubj->obj ={(ehall, {edeimos, ephobos}), • (ekuiper, {emiranda, enereid}), • (egalileo, { eio, eganymede, etc.}),etc.
Wh-Wh queries • fdbr discover_relsubj->obj ={(ehall, {edeimos, ephobos}), • (ekuiper, {emiranda, enereid}), • (egalileo, { eio, eganymede, etc.}),etc.
Complex nominals -- answer from FDBRe.g. “beautiful dancer”- use events to encode type of membershippolysemy – e.g. “depart” – use event data and PPs
Implementation of the NLI Virtuoso s/w used to store triples in a triplestore with a SPARQL endpoint. Only basic SPARQL retrieval command used by the NLI The NLI is constructed as an executable attribute grammar in Haskell. XSAIGA Parser combinators from Hackage are used to construct the EAG XSAIGA Hsparql package used to communicate with SPARQL endpoint from Haskell program Haskell program also runs on a wireless router with acceptable speed Shane Peelar is building special-purpose h/w. for semantic computation.
Future work Derive events from dbpedia Create new h/ware for semantic processing Implement triplestores and NLI on internet of things, e.g coffee makers, toys
Our Previous work • SEMANTICS: • R. A. Frost, B. S. Amour, and R. Fortier. An event based denotational semantics for natural language queries to data represented in triple stores. IEEE ICSC, 2013. IEEE, 142-145. • Frost, R. A. (2006) Realization of natural language interfaces using lazy functional programming. ACM Comp. Surv. 38 (4) Article 11. • Peelar, Shane (2016) Accommodating prepositional phrases in a highly modular natural language query interface to semantic web triplestores using a novel event-based denotational semantics for English and a set of functional parser combinators, Master’s Thesis. Electronic Theses and Dissertations 5911. • Richard A. Frost, Wale Agboola, Eric Matthews, Jonathan A. Donais (2014)An Event-Driven Approach for Querying Graph-Structured Data Using Natural Language. EDBT/ICDT Workshops 2014: 192-199 • PARSING: • Frost, R. and Launchbury, J. (1989) Constructing Natural Language Interpreters in a Lazy Functional Language. Comput. J. 32(2): 108-121 (1989). • Hafiz, R. and Frost, R, (2010) Lazy combinators for executable specifications of general attribute grammars, Proc. of the 12th International Symposium on Practical aspects of declarative languages (PADL), LNCS 5937, 167-182. • Frost, R., Hafiz, R., Callaghan, P., (2007) Modular and efficient top-down parsing for ambiguous left-recursive grammars. In 10th ACL, IWPT, 109–120.
Related work: NLIs to DBs or the Semantic Web Main and Benson (1983) Denotational semantics for natural-language question-answering systems. American Journal of Computational Linguistics, Volume 9, Number 1. Cimiano, P., Haase, P., Heizmann, J.(2007): Porting natural language interfaces between domains: an experimental user study with the orakel system. In: IUI 2007: Proceedings of the 12th International Conference on Intelligent User Interfaces, pp. 180–189. ACM, New York . Van Eijck, J. and Unger, C. (2010).Computational semantics with functional programming. Cambridge University Press. S. Ferré. SQUALL (2012): a controlled natural language for querying and updating RDF graphs. In T. Kuhn and N.E. Fuchs, editors, Controlled Natural Languages, LNCS 7427, pages 11–25. Springer.
Related work: Compositional Semantics English as a Formal Language Richard Montague In Bruno Visentini (ed.), Linguaggi nella societa e nella tecnica. Edizioni di Communita. pp. 188-221 (1970) Abstract I reject the contention that an important theoretical difference exists between formal and natural languages. On compositional semantics WIodekZadrozny CTES DE COLING-92, Aug. 23-28, 1992 IBM Research T. J. Watson Research Center Yorktown Heights, NY 10598 WLODZ @ WATSON.1BM.CO CONCURRENCY VERIFICATION: Introduction to Compositional and Non-Compositional Methods W.-P. de Roever, F. de Boer, U. Hannemann, J. Hooman, Y. Lakhnech, M. Poel, and J. Zwiers. Cambridge University Press, 2001. The interaction of compositional semantics and event semantics L Champollion - Linguistics and Philosophy, 2015 – Springer Challenging the Principle of Compositionality in Interpreting Natural Language Texts. (<=2001) Francoise Gayral, Daniel Kayser and François Levy. HAL Archives: UniversityParis Nord, Av. J. B. Clement, 93430 Villetaneuse,Francefl@lipn.univ-paris13.fr
Future Work Convert DBpedia to event format Accommodate negation, superlatives and polysemy Speech frontend Intelligent things – coffee maker, children’s toys