200 likes | 314 Views
Dr. Christian F. Hempelmann Chief Scientific Officer April 17, 2008. Welcome The New York Semantic Web Meetup Group. Human Language. Information comes as Natural Language (NL) No Search Relevance without Understanding NL Underlying production and comprehension rules
E N D
Dr. Christian F. HempelmannChief Scientific Officer April 17, 2008
Human Language • Information comes as Natural Language (NL) • No Search Relevance without Understanding NL • Underlying production and comprehension rules • Users with low error tolerance • Observable output in principle irregular • No Understanding NL without Semantics • Logic form conversion is not understanding • Surface cooccurence statistics is not understanding • Automatic semantic tagging presupposes understanding
Semantics Done Somehow • Don’t acquire any semantic resources • Try to guess meaning from meaning epiphenomena • syntax • co-occurence • Emphasize the formality and formalisms of precise, quantitative methods • Achieve and accept <80% accuracy • Get excited about 0.028% improvements • Hardly ever implement real-life systems • Replace them with artificial self-serving criteria of evaluation
Semantics Done Somehow • Characteristically, this lack of interest in linguistic theory expresses itself in the proposals to limit the term ‘theory’ to ‘summary of data’ [...] (Chomsky 1965: 194) • Not all that is measurable is meaning! (Lyons 1963: 5)
Semantics Done Semantically • Acquire massive human-like knowledge resources(Nirenburg and Raskin 2004) • Aspire to > 95% accuracy • Implement systems based on these resources as ultimate evaluation criterion • Share the resources (licensing)
Semantics Done Semantically hakia OntoSem • Under 10k-concept language-independent ontology • Ontology-based lexica, including a 50k-entry English lexicon with 80k senses • Onomastica, dictionaries of proper names, products, etc. • Text meaning representation (TMR) language, an ontology-based knowledge representation language, • OntoParser transforming NL text into TMRs • Fact repository, containing processed TMRs
Ontology Top Level ALL Objects Events Properties
Ontology Event Top Level Events Mental events Social events Physical events
Ontological Concept go is-a motion-event agent animal instrument body-part vehicle source location destination location start-time temporal-unit end-time temporal-unit Lexical Entry drive-V1 [all but semantic information omitted] sem-struc go agent human (adult) instrument car Examples
Simplified TMR • Mary drove from Boston to New York on Wednesday • go agent Mary instrument car source Boston destination New York start-time Wednesday end-time Wednesday
hakia OntoSem Overview • OntoParse crawled pages • Understand their meaning • Anticipate queries about this meaning • Where were the drugs smuggled? • What was shipped to the United States?
“Information Underload” in Keyword Search • Keyword cancer • Unreported pages: • “. . . malignancy . . .” • “. . . tumor . . .” • “. . . growth . . .” • “. . . positive biopsy . . .” • “. . . bad biopsy results . . .”
Parallelizing and Generalizing • query: “drug for migraine” • also desired specific results from general “drug” • page text: ibuprofen • page text: aspirin • query: “does aspirin work for headaches” • also desired parallel results from “aspirin” • page text: ibuprofen • page text: tylenol
Parallelizing and Generalizing Resource Optimization • Distribution of information about a sense between lexicon and ontology • Put more constraints into the sem-struc of the sense or • Create a daughter of the general concept • Acquire the sense of “aspirin” that is, roughly, “painkiller used specifically to treat headaches” • Closest concept I find in the ontology is PAINKILLER • Put the constraints (made-of(sem acetylsalicylic-acid)) (instrument-of(sem heal(theme(sem headache)))) • Into the sem-struc of the lexicon sense • Or acquire a new daughter concept of PAINKILLER called ASPIRIN
Meaning QDexing • Where was JFK killed? • KILL • beneficiary PRESIDENT “John F. Kennedy” • location *?* • President Kennedy was assassinated in Dallas, Texas, ... . [wikipedia] • KILL • beneficiary PRESIDENT “John F. Kennedy” • location CITY “Dallas” • ...
Inferences • A device made of small training grenades stuffed with black powder is thrown at a Manhattan building at 3:55 a.m. • THROW • theme BOMB • agent ANIMAL • There was small-scale property-damage, but no injuries. • DESTROY • theme ASSET • instrument BOMB
Effects of Implementation on hakia OntoSem • Sense for non-critical gaps: • stop words and adverbs not in the first wave • Sense for direction of scalability: • further fast acquired lexicons • increase importance of existing semantic constraints • Assessment of robustness: • With how many missed senses do we still deliver a search result? • When do we decide not to QDex parser result? • Ultimately, access to real performance data:People will use hakia
Effects of hakia OntoSem on Internet Search • Relates a text to a much larger number of texts on semantic, meaningful connections and associations, like a human • Pursues inferences, entailments, presuppositions, etc. • Catches relevant web pages even if the actual words in a query do not appear there • Rejects irrelevant pages even if some actual words in a query appear in it • Improves the quality of the search beyond anything attainable by bag of words methods • Introduces a new era of human-caliber searches