1 / 20

Dr. Christian F. Hempelmann Chief Scientific Officer April 17, 2008

Dr. Christian F. Hempelmann Chief Scientific Officer April 17, 2008. Welcome The New York Semantic Web Meetup Group. Human Language. Information comes as Natural Language (NL) No Search Relevance without Understanding NL Underlying production and comprehension rules

kami
Download Presentation

Dr. Christian F. Hempelmann Chief Scientific Officer April 17, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dr. Christian F. HempelmannChief Scientific Officer April 17, 2008

  2. Welcome The New York Semantic Web Meetup Group

  3. Human Language • Information comes as Natural Language (NL) • No Search Relevance without Understanding NL • Underlying production and comprehension rules • Users with low error tolerance • Observable output in principle irregular • No Understanding NL without Semantics • Logic form conversion is not understanding • Surface cooccurence statistics is not understanding • Automatic semantic tagging presupposes understanding

  4. Semantics Done Somehow • Don’t acquire any semantic resources • Try to guess meaning from meaning epiphenomena • syntax • co-occurence • Emphasize the formality and formalisms of precise, quantitative methods • Achieve and accept <80% accuracy • Get excited about 0.028% improvements • Hardly ever implement real-life systems • Replace them with artificial self-serving criteria of evaluation

  5. Semantics Done Somehow • Characteristically, this lack of interest in linguistic theory expresses itself in the proposals to limit the term ‘theory’ to ‘summary of data’ [...] (Chomsky 1965: 194) • Not all that is measurable is meaning! (Lyons 1963: 5)

  6. Semantics Done Semantically • Acquire massive human-like knowledge resources(Nirenburg and Raskin 2004) • Aspire to > 95% accuracy • Implement systems based on these resources as ultimate evaluation criterion • Share the resources (licensing)

  7. Semantics Done Semantically hakia OntoSem • Under 10k-concept language-independent ontology • Ontology-based lexica, including a 50k-entry English lexicon with 80k senses • Onomastica, dictionaries of proper names, products, etc. • Text meaning representation (TMR) language, an ontology-based knowledge representation language, • OntoParser transforming NL text into TMRs • Fact repository, containing processed TMRs

  8. Ontology Top Level ALL Objects Events Properties

  9. Ontology Event Top Level Events Mental events Social events Physical events

  10. Ontological Concept go is-a motion-event agent animal instrument body-part vehicle source location destination location start-time temporal-unit end-time temporal-unit Lexical Entry drive-V1 [all but semantic information omitted] sem-struc go agent human (adult) instrument car Examples

  11. Simplified TMR • Mary drove from Boston to New York on Wednesday • go agent Mary instrument car source Boston destination New York start-time Wednesday end-time Wednesday

  12. hakia OntoSem Overview • OntoParse crawled pages • Understand their meaning • Anticipate queries about this meaning • Where were the drugs smuggled? • What was shipped to the United States?

  13. “Information Underload” in Keyword Search • Keyword cancer • Unreported pages: • “. . . malignancy . . .” • “. . . tumor . . .” • “. . . growth . . .” • “. . . positive biopsy . . .” • “. . . bad biopsy results . . .”

  14. Parallelizing and Generalizing • query: “drug for migraine” • also desired specific results from general “drug” • page text: ibuprofen • page text: aspirin • query: “does aspirin work for headaches” • also desired parallel results from “aspirin” • page text: ibuprofen • page text: tylenol

  15. Parallelizing and Generalizing Resource Optimization • Distribution of information about a sense between lexicon and ontology • Put more constraints into the sem-struc of the sense or • Create a daughter of the general concept • Acquire the sense of “aspirin” that is, roughly, “painkiller used specifically to treat headaches” • Closest concept I find in the ontology is PAINKILLER • Put the constraints (made-of(sem acetylsalicylic-acid)) (instrument-of(sem heal(theme(sem headache)))) • Into the sem-struc of the lexicon sense • Or acquire a new daughter concept of PAINKILLER called ASPIRIN

  16. Meaning QDexing • Where was JFK killed? • KILL • beneficiary PRESIDENT “John F. Kennedy” • location *?* • President Kennedy was assassinated in Dallas, Texas, ... . [wikipedia] • KILL • beneficiary PRESIDENT “John F. Kennedy” • location CITY “Dallas” • ...

  17. Inferences • A device made of small training grenades stuffed with black powder is thrown at a Manhattan building at 3:55 a.m. • THROW • theme BOMB • agent ANIMAL • There was small-scale property-damage, but no injuries. • DESTROY • theme ASSET • instrument BOMB

  18. Effects of Implementation on hakia OntoSem • Sense for non-critical gaps: • stop words and adverbs not in the first wave • Sense for direction of scalability: • further fast acquired lexicons • increase importance of existing semantic constraints • Assessment of robustness: • With how many missed senses do we still deliver a search result? • When do we decide not to QDex parser result? • Ultimately, access to real performance data:People will use hakia

  19. Effects of hakia OntoSem on Internet Search • Relates a text to a much larger number of texts on semantic, meaningful connections and associations, like a human • Pursues inferences, entailments, presuppositions, etc. • Catches relevant web pages even if the actual words in a query do not appear there • Rejects irrelevant pages even if some actual words in a query appear in it • Improves the quality of the search beyond anything attainable by bag of words methods • Introduces a new era of human-caliber searches

  20. Thank you

More Related