460 likes | 467 Views
This presentation discusses the problem of effective language and communication, and proposes anchoring language to universal meaning as a solution. It explores the concept of wordnets and introduces The Global Wordnet Grid as a way to connect wordnets for different languages. The future goal is to ensure equal access to knowledge and information on the Internet for all, and to develop systems that understand language.
E N D
Piek Vossen Irion Technologies/Vrije Universiteit Amsterdam 6th International Plain Language Conference, October 11-14th, 2007, Amsterdam The Global Wordnet Grid: anchoring languages to universal meaning
6th International PLAIN language Conference 11-14th October, Amsterdam Overview: • Problem: effective language and communication • From human to human • From human to machine • From machine to machine • From human to machine and back to human, maybe via other machines... • Solution: anchoring language to universal meaning • Wordnets: network of words related through meaning • The Global Wordnet Grid: wordnets for languages connected to each other through an ontology • Future: • Equal access to the knowledge and information on the Internet to all people, regardless of language and background • Systems that start to understand language
6th International PLAIN language Conference 11-14th October, Amsterdam Problem
6th International PLAIN language Conference 11-14th October, Amsterdam Language is inherently vague and ambiguous • Communication through language: • mediates between the expectation of the Speaker and the Hearer => half a word is enough • Language is not fully descriptive but minimally sufficient: • Do not bother the Hearer with information that is already known => rely on background knowledge • Use a minimal set of words and expressions to avoid memory overloading => words and expressions have multiple meaning
6th International PLAIN language Conference 11-14th October, Amsterdam "gavagai" Understanding is fundamentally impossible Concept in our head sweet pet wanna hug rabbit with carrots and rosemary devine appearance announcing spring Plato with beard W.V.O.Quine (1964): inscrutability of reference
6th International PLAIN language Conference 11-14th October, Amsterdam Full understanding is fundamentally impossible BUT? • People do communicate... • People even communicate with computers... • As long as language is effective: • meaning= to have the desired effect! • Link language to useful content!
6th International PLAIN language Conference 11-14th October, Amsterdam What is effective computer-mediated language? • Computers store information and knowledge in textual form: • People search information and knowledge by 'querying' computers • Effective Computer Mediated Communication (CMC) = find what you need and nothing else • Computers analyze information and knowledge: • Collect data and send alerts, reports and facts • Computers connect people: • Support communication across people by analyzing communication or translating languages
6th International PLAIN language Conference 11-14th October, Amsterdam Concept Expression in language Expression in language Words…. ….Words Index of Strings Strings Strings ape …. energy …. mass …. …. zebra Information Seeker Query Information Concept Information Provider Strings
6th International PLAIN language Conference 11-14th October, Amsterdam Conceptual match Concept Concept Expression in language Expression in language my cell phone…. ….mobile Index of Strings Strings Strings Information Provider Strings ape …. …. …. mobile …. …. zebra Information Seeker Query Information Linguistic mismatch
6th International PLAIN language Conference 11-14th October, Amsterdam Conceptual mismatch Concept Concept Expression in language Expression in language my cell phone…. ….nerve cells Index of Strings Strings Strings Information Provider Strings ape …. cell …. …. …. …. zebra Information Seeker Query Information Linguistic match
6th International PLAIN language Conference 11-14th October, Amsterdam Conceptual mismatch Concept Concept Expression in language Expression in language police cell …. …. nerve cells Index of Strings Strings Strings Information Provider Strings ape …. cell …. …. …. …. zebra Information Seeker Query Information Linguistic match
6th International PLAIN language Conference 11-14th October, Amsterdam Conceptual match Concept Concept Expression in language Expression in language neuron …. ….nerve cells Index of Strings Strings Strings Information Provider Strings ape …. cell …. …. …. …. zebra Information Seeker Query Information Linguistic mismatch
6th International PLAIN language Conference 11-14th October, Amsterdam found intersection relevant query: “cell” Recall & Precision Search engine for database with all documents “nerve cell” “police cell” “cell phone” “mobile phones” recall = doorsnede / relevant precision = doorsnede / gevonden Recall < 20% for basic search engines! (Blair & Maron 1985)
6th International PLAIN language Conference 11-14th October, Amsterdam Useless dialogues with Alice-bot
6th International PLAIN language Conference 11-14th October, Amsterdam It is useful to anchor meaning! • Anchoring already takes place all over the world through standardization: • measures and units: meter, liter, kilo • terminological databases, legal definitions, contracts • international cooperation • ontologies: definition of the meaning of concepts in a formal knowledge presentation system, (1st order logic) so that a computer can reason with it
6th International PLAIN language Conference 11-14th October, Amsterdam Solution
6th International PLAIN language Conference 11-14th October, Amsterdam How can we anchor the meaning of words? • We can anchor words to each other: • semantic network or wordnet • We can anchor words to logical implications: • a formal ontology
6th International PLAIN language Conference 11-14th October, Amsterdam animal cat dog kitten puppy Relational model of meaning animal kitten man boy man woman cat meisje boy girl dog puppy woman
6th International PLAIN language Conference 11-14th October, Amsterdam Princeton WordNet • Developed by George Miller and his team at Princeton University, as the implementation of a mental model of the lexicon • Organized around the notion of a synset: a set of synonyms in a language that represent a single concept • Semantic relations between concepts • Covers over 100,000 concepts and over 120,000 English words
6th International PLAIN language Conference 11-14th October, Amsterdam Wordnet: a network of semantically related words {conveyance;transport} {vehicle} {armrest} {car mirror} {motor vehicle; automotive vehicle} {car door} {doorlock} {car; auto; automobile; machine; motorcar} {bumper} {hinge; flexible joint} {car window} {cruiser; squad car; patrol car; police car; prowl car} {cab; taxi; hack; taxicab}
Domains SUMO DOLCE Fahrzeug Object Transport 1 Auto Zug Device voertuig Road Water Air 1 2 vehicle TransportDevice auto trein German Words 1 4 car train 2 ENGLISH Car … Train … Vehicle Dutch Words liiklusvahend 2 1 English Words auto killavoor 3 3 vehículo 2 1 Estonian Words véhicule auto tren 1 veicolo voiture train 1 2 Inter-Lingual-Index auto treno Spanish Words 2 dopravníprostředník French Words 2 Italian Words 1 auto vlak 2 Czech Words Wordnet family Princeton WordNet, (Fellbaum 1998): 115,000 conceps EuroWordNet, (Vossen 1998): 8 languages BalkaNet, (Tufis 2004): 6 languages Global Wordnet Association: all languages
6th International PLAIN language Conference 11-14th October, Amsterdam object artifact, artefact (a man-made object) natural object (an object occurring naturally) block instrumentality body box spoon bag device implement container tool instrument Wordnets as autonomous language-specific structures Wordnet1.5 Dutch Wordnet voorwerp {object} blok {block} lichaam {body} werktuig{tool} bak {box} lepel {spoon} tas {bag}
6th International PLAIN language Conference 11-14th October, Amsterdam Complex equivalence relations 1. Multiple Targets (1:many) Dutch wordnet: schoonmaken (to clean) matches with 4 senses of clean in WordNet1.5: • make clean by removing dirt, filth, or unwanted substances from • remove unwanted substances from, such as feathers or pits, as of chickens or fruit • remove in making clean; "Clean the spots off the rug" • remove unwanted substances from - (as in chemistry) 2. Multiple Sources (many:1) Dutch wordnet: versiersel near_synonym versiering Target record: decoration. 3. Multiple Targets and Sources (many:many) Dutch wordnet: toestel near_synonym apparaat Target records: machine; device; apparatus; tool
6th International PLAIN language Conference 11-14th October, Amsterdam Complex equivalece relations Gaps in the English WordNet: • genuine, cultural gaps: unknown in English culture: • Dutch:klunen, to walk on skates over land from one frozen water to the other • pragmatic gaps: the concept is known but is not expressed by a single lexicalized form in English: • Dutch: kunstproduct = artifact substance <=> artifact object
6th International PLAIN language Conference 11-14th October, Amsterdam From EuroWordNet to Global WordNet • Global Wordnet Association: http://www.globalwordnet.org • Bi-annual conference: India (2002), Czech (2004), Korea (2006), Hungary (2008), .... • Currently, wordnets exist for more than 40 languages, including: Arabic, Bantu, Basque, ...., Chinese, Bulgarian, Estonian, Hebrew, ...., Icelandic, Japanese, Kannada, Korean, Latvian, Latin, ....Nepali, Persian, Romanian, Sanskrit, Tamil, Thai, Turkish, .... Zulu • Many languages are genetically and typologically unrelated
6th International PLAIN language Conference 11-14th October, Amsterdam Some downsides • Construction is not done uniformly • Coverage differs • Not all Wordnets can communicate with one another: • not linked • linked to different versions: 1.5, 1.6, 1.7, 2.0 and now 3.0, 3.1 • linked with different relations • Proprietary rights restrict free access and usage • A lot of the semantics is duplicated • Complex and obscure equivalence relations due to linguistic differences between English and other languages
6th International PLAIN language Conference 11-14th October, Amsterdam Fahrzeug 1 Auto Zug 2 vehicle German Words 1 car train 2 English Words 3 3 vehículo 1 auto tren veicolo 1 2 Spanish Words auto treno 2 Italian Words Next step: Global WordNet Grid Inter-Lingual Ontology voertuig 1 auto trein Object 2 liiklusvahend Dutch Words 1 Device auto killavoor TransportDevice 2 Estonian Words véhicule 1 voiture train 2 dopravníprostředník French Words 1 auto vlak 2 Czech Words
6th International PLAIN language Conference 11-14th October, Amsterdam The Ontology: main features • Formal, artificial ontology serves as universal index of concepts • List of concepts is not just based on the lexicon of a particular language (unlike in EuroWordNet) but uses ontological observations: • Lexicalization in a language is not sufficient to warrant inclusion in the ontology • Lexicalization in all or many languages may be sufficient • Ontological observations will be used to define the concepts in the ontology • Concepts are related in a type hierarchy • Concepts are defined with axioms: Knowledge Interchange Format (KIF) based on first order predicate calculus and atomic elements
6th International PLAIN language Conference 11-14th October, Amsterdam Concepts by ontological observations • Types and Roles among the hyponyms of dog in Wordnet: • husky, lapdog; toy dog; hunting dog; working dog; dalmatian, coach dog, carriage dog; basenji; pug, pug-dog; Leonberg; Newfoundland; Great Pyrenees; spitz; griffon, Brussels griffon, Belgian griffon; corgi, Welsh corgi; poodle, poodle dog; Mexican hairless; pooch, doggie, doggy, barker, bow-wow; cur, mongrel, mutt • Current WordNet treatment: (1) a husky is a kind of dog (2) a husky is a kind of working dog • What’s wrong? (2) is defeasible, (1) is not: *This husky is not a dog => RIGID TYPE This husky is not a working dog => ROLE, NON-RIGID
6th International PLAIN language Conference 11-14th October, Amsterdam Ontology versus wordnet • Hierarchy of disjunct types: Canine PoodleDog; NewfoundlandDog; GermanShepherdDog; Husky • Wordnet: • NAMES for TYPES: {poodle}EN, {poedel}NL, {pudoru}JP • ((instance x Poodle) • LABELS for ROLES: {watchdog}EN, {waakhond}NL, {banken}JP ((instance x Canine) and (role x GuardingProcess))
6th International PLAIN language Conference 11-14th October, Amsterdam Properties of the Ontology • Minimal: terms are distinguished by essential properties only • Comprehensive: includes all distinct concepts types of all Grid languages • Allows definitions via KIF of all words that express non-rigid, non-essential properties of types • Logically valid, allows inferencing
6th International PLAIN language Conference 11-14th October, Amsterdam Ontology versus Wordnet • Not added to the type hierarchy: {straathond}NL (a dog that lives in the streets) • ((instance x Canine) and (habitat x Street)) • Added to the type hierarchy: {klunen}NL (to walk on skates from one frozen body to the next over land) KluunProcess => WalkProcess Axioms: (and (instance x Human) (instance y Walk) (instance z Skates) (wear x z) (instance s1 Skate) (instance s2 Skate) (before s1 y) (before y s2) etc… • National dishes, customs, games,....
6th International PLAIN language Conference 11-14th October, Amsterdam Ontology versus Wordnet • Refer to sets of types in specific circumstances or to concept that are dependent on these types, next to {rivierwater}NL there are many others: {theewater}NL (water used for making tea) {koffiewater}NL (water used for making coffee) {bluswater}NL (water used for making extinguishing file) • Relate to linguistic phenomena: • gender, perspective, aspect, diminutives, politeness, pejoratives, part-of-speech constraints
6th International PLAIN language Conference 11-14th October, Amsterdam KIF expression for gender marking • {teacher}EN ((instance x Human) and (agent x TeachingProcess)) • {Lehrer}DE((instance x Man) and (agent x TeachingProcess)) • {Lehrerin}DE((instance x Woman) and (agent x TeachingProcess))
6th International PLAIN language Conference 11-14th October, Amsterdam KIF expression for perspective sell: subj(x), direct obj(z),indirect obj(y) buy: subj(y), direct obj(z),indirect obj(x) FinancialTransaction (and (instance x Human)(instance y Human) (instance z Entity) (instance e FinancialTransaction) (source x e) (destination y e) (patient e) The same process but a different perspective by subject and object realization: marry in Russian two verbs, apprendre in French can mean teach and learn
6th International PLAIN language Conference 11-14th October, Amsterdam Advantages of the Global Wordnet Grid • Shared and uniform world knowledge: • universal inferencing • uniform text analysis and interpretation • More compact and less redundant databases • More clear notion how languages map to the knowledge • better criteria for expressing knowledge • better criteria for understanding variation
6th International PLAIN language Conference 11-14th October, Amsterdam Future
6th International PLAIN language Conference 11-14th October, Amsterdam golf club(s) Tiger Woods clubs for golf golf sticks Linguistic analysis thesaurus Synonyms, Semantic network Language technology: a hole in one! Golf at the club golf clubs
6th International PLAIN language Conference 11-14th October, Amsterdam Index concepts rather than words • Meaning of a word in context: • Domain of the document: • Juventus => football • Topic of the paragraph: • transfer scandal => business, crime • Phrase: linguistically-motivated combination of words: • [wing player]football player in [police cell]jail • Topic of the query: • Can I order chicken wings? => food • Phrase: • [chicken wings]dish
6th International PLAIN language Conference 11-14th October, Amsterdam Expansion with clear hyponymy dog hunting dog puppy dachshund lapdog poodle bitch street dog watchdog short hair dachshund long hair dachshund Expansion from a type to roles
6th International PLAIN language Conference 11-14th October, Amsterdam Expansion with clear hyponymy dog hunting dog puppy dachshund lapdog poodle bitch street dog watchdog short hair dachshund long hair dachshund Expansion from a role to types and other roles
6th International PLAIN language Conference 11-14th October, Amsterdam Thought 携帯電話 (keitaidenwa ) Texts Expression Objects in reality Ontology Knowledge & information • Useful and effective behavior: • reason over knowledge • collect information and data • deliver services and be helpful
6th International PLAIN language Conference 11-14th October, Amsterdam Automotive ontology: (http://www.ontoprise.de)
Question Analysis Word Concept information Search Engine products mobile Topic detection accessories head phone reparair Text Analysis • User • Model • Intention • Satisfaction • Emotion • Information • State: • Positive • Negative • Relations Dialogue system Dialogue Manager • Can I help you? • My head phone is broke. • Would you like repair or products? • I want to buy a new one. • Can yousay more about products? • It is for my cell phone. • Can you give more details? • It is a Nokia 6110 Website • I got the following accessoires for you. • Please have a look. • That is not what I want! 6th International PLAIN language Conference 11-14th October, Amsterdam
6th International PLAIN language Conference 11-14th October, Amsterdam Communicative dialog system • Prevent deadlocks: • Detects vagueness and ambiguity (what meaning of cell?) • Detect topic changes • Uses negative feedback: “No jails, I want cell phones!” • Can handle out-of-domain questions (users do not know what the system knows) : • "We do not have hotel rooms but we do have electronic equipment". • "No, we do not have portophones but we do have other electronic equipement such as cell phones" space object equipment room hotel room cell phone portophone
6th International PLAIN language Conference 11-14th October, Amsterdam THANK YOU FOR YOUR ATTENTION