240 likes | 422 Views
Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction. Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk. Web of Data.
E N D
Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction Danica Damljanović, Milan Agatonović, Hamish Cunningham contact: danica@dcs.shef.ac.uk
Web of Data • Large datasets such as Linked Open Data available • How can we use these data? • Modigliani test: “tell me the locations of all the original paintings of Modigliani” (Richard MacManus, ReadWriteWeb) ESWC 2010
Passing Modigliani test Source: http://blog.larkc.eu/: “LDSR Passes the Modigliani Test for Semantic Web”, more than 1h to generate a SPARQL query PREFIX fb: <http://rdf.freebase.com/ns/> PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX dbp-prop: <http://dbpedia.org/property/> PREFIX dbp-ont: <http://dbpedia.org/ontology/> PREFIX umbel-sc: <http://umbel.org/umbel/sc/> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX ot: <http://www.ontotext.com/> SELECT DISTINCT ?painting_l ?owner_l ?city_fb_con ?city_db_loc ?city_db_cit WHERE { ?p fb:visual_art.artwork.artistdbpedia:Amedeo_Modigliani ; fb:visual_art.artwork.owners [ fb:visual_art.artwork_owner_relationship.owner ?ow ] ; ot:preferredLabel ?painting_l. ?owot:preferredLabel ?owner_l . OPTIONAL { ?owfb:location.location.containedby [ ot:preferredLabel ?city_fb_con ] } . OPTIONAL { ?owdbp-prop:location ?loc. ?loc rdf:type umbel-sc:City ; ot:preferredLabel ?city_db_loc } OPTIONAL { ?owdbp-ont:city [ ot:preferredLabel ?city_db_cit ] } } ESWC 2010
Passing Modigliani Test: future “tell me the locations of all the original paintings of Modigliani” ESWC 2010
But, others have already done it? complex questions small datasets (narrow domain) large datasets (several domains) (Damljanović and Bontcheva, 2009.) simple factual questions ESWC 2010
FREyA (Feedback, Refinement, Extended Vocabulary Aggregator) • Increase recall by: • generating the dialog whenever an “unknown” term appears in the question • Increase precision by: • generating the dialog whenever one term refers to more than one concept in the ontology • The dialogis generated by combining the language of the user and the ontology • Learn from the dialog ESWC 2010
FREyA Workflow Potential Ontology Concept (POC) Ontology Concept (OC) learn ESWC 2010
Finding POCs ESWC 2010
Finding OCs ESWC 2010
Mapping POC to OCs POC POC population geo:State geo:State new york geo:City ESWC 2010 geo:cityPopulation
New York is a city ESWC 2010
New York is a state ESWC 2010
The User Controls the Output POC min geo:loElevation point POC geo:isLowestPointOf geo:LoPoint POC max state geo:stateArea area geo:State ESWC 2010
What is the lowest point of the state with the largest area? TRIPLES: ?firstJoker – geo:isLowestPointOf – geo:State geo:State – (max) geo:stateArea - ?lastJoker SPARQL: prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select ?firstJoker ?p0 ?c1 ?p2 ?lastJoker where { { { ?c1 ?p0 ?firstJoker} UNION { ?firstJoker ?p0 ?c1} . filter (?p0=<http://www.mooney.net/geo#isLowestPointOf>) . } ?c1 rdf:type <http://www.mooney.net/geo#State> . ?c1 ?p2 ?lastJoker . filter (?p2=<http://www.mooney.net/geo#stateArea>) . } ORDER BY DESC(xsd:double(?lastJoker)) however... ESWC 2010
What is the lowest point of the state with the largest area? the answer for both is Death Valley TRIPLES: ?firstJoker – (min) geo:loElevation – geo:LoPoint geo:LoPoint - ?joker3 – geo:State geo:State – (max) geo:stateArea - ?lastJoker SPARQL: prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> prefix xsd: <http://www.w3.org/2001/XMLSchema#> select ?firstJoker ?p0 ?c1 ?joker3 ?c2 ?p3 ?lastJoker where { ?c1 ?p0 ?firstJoker . filter (?p0=<http://www.moony.net/geo#loElevation>) . ?c1 rdf:type <http://www.mooney.net/geo#LoPoint> . {{ ?c2 ?joker3 ?c1 } UNION { ?c1 ?joker3 ?c2 }} ?c2 rdf:type <http://www.mooney.net/geo#State> . ?c2 ?p3 ?lastJoker . filter (?p3=<http://www.mooney.net/geo#stateArea>) . } ORDER BY ASC(xsd:double(?firstJoker)) DESC(xsd:double(?lastJoker)) ESWC 2010
FREyA: a Natural Language Interface to Ontologies • http://gate.ac.uk/freya ESWC 2010
evaluation correctness ranked suggestions learning ESWC 2010
evaluation: correctness Mooney GeoQuery dataset: 250 questions ESWC 2010
evaluation: suggestions ranking • Mooney GeoQuery dataset: 250 questions • Manually labelled correct rankings • Mean Reciprocal Rank (MRR): 0.81 ESWC 2010
evaluation: learning • 103 questions correctly answered by engaging the user into 1 dialog • MRR 0.72 ESWC 2010
evaluation: learning • MRR improved from 0.72 to 0.78 ESWC 2010
Next Steps • Passing Modigliani test • Exploring unknown data structures with FREyA, especially if they are large • LDSR: DBPedia, Freebase, Geonames, UMBEL, Wordnet, CIA World Factbook, Lingvoj, MusicBrainz • http://ontotext.com/ldsr • User-centric evaluation ESWC 2010
thank you for your attention! questions? Thanks to Abraham Bernstein and Esther Kaufmann from the University of Zurich, for sharing with us Mooney dataset in OWL format, and J. Mooney from University of Texas for making this dataset publicly available. Contact: danica@dcs.shef.ac.uk
References • Damljanovic, D., Bontcheva, K.: Towards Enhanced Usability of Natural Language Interfaces to Knowledge Bases. In Devedzic V. and Gasevic D. (Eds.), Special issue on Semantic Web and Web 2.0, Annals of Information systems, Springer-Verlag, 2009. ESWC 2010