300 likes | 464 Views
Word sense disambiguation Study on word net ontology. Akilan Velmurugan Computer Networks – CS 790G. Overview. What is WSD ? How wordnet is analyzed as a Complex Network What are the results Project Methodology Area of study Key Findings/Results New approaches Improvement techniques
E N D
Word sense disambiguationStudy on word net ontology Akilan Velmurugan Computer Networks – CS 790G
Overview • What is WSD ? • How wordnet is analyzed as a Complex Network • What are the results • Project Methodology • Area of study • Key Findings/Results • New approaches • Improvement techniques • Conclusion
Project Description • Objective • Study on WSD • Effects of WSD in Word Sense Ontology • Characteristics of WordNet • Results • How do match words with other words • Parameters taken for study of word sense • Improvise them by making necessary changes • Study network characteristics
WordNet - overview • Machine readable semantic dictionary interlinked by semantic relations • Developed at Princeton University as a large lexical database for English language • Most widely used linguistic resource • Free for public (GPL ) • Forms a scale free network with small average shortest path having words as nodes and concepts as links • Easily navigable
WordNet (Structure) • Shows the relation in the form of • Noun, Verb, Adjective, adverb • Synonym • Hypernym (Is a kind of …) • Hyponym (… Is a kind of) • Troponym (particular ways to …) • Meronym (parts of …) • ---- about 25 relations • Also available for online navigation
WordNet online - by Princeton University WordNet online
WordNet Browser WordnetApplication
WordNet (working) • WSD: • Corpus based approaches • Set of samples that enables the system • Knowledge based approaches • Machine readable dictionary with relations • WordNet Research • Open source • Ranking of synsets derived from word frequencies in the British National Corpus • Top 1000 • Content manipulation of text • Dataset I – controlled and calibrated study • Dataset II – collected using mechanical trunk using pairs WordNet Database
Word Sense Disambiguation (WSD) • Task of determining the meaning of an ambiguous word in the given context • Bank • Edge of a river or • Financial institution that accepts money • Refers to the resolution of lexical semantic ambiguity and its goal is to attribute the correct senses to words (AI-complete problem)
WSD: Area of Research • Assigning correct sense to words having electronic dictionary as source of word definitions • Open research field in Natural Language Processing (NLP) • Hard Problem which is a popular area for research • Used in speech synthesis by identifying the correct sense of the word
JavaScript Visual WordNet Visual WordNet
Visual Thesaurus Visual Thesaurus
WordNet – Theoretical aspects • Wordnet – word sense ontology • Symbols are words • Synset: list of words and semantic relations between them • Word sense disambiguation • Wordnet structure using latent semantics • Variable lexical notation for a concept • Citibase – Thesaurus • Semantic relatedness • And few others…
WSD: using latent semantics • Measures the semantic distance of concepts • Relatedness and between-ness are calculated • Matrix form of wordnet data structure is used • Can be used to integrate with other applications • Uses Singular Value Decomposition (SVD) algorithm • Example: Multiple synsets are • {car, gondola} • {car, railway car} • {car, automobile} • {Motor vehicle}, {Coupe}, {Sedan}, {Taxi}
MDS-example 1, 2, 3, 4, 10, 12 5, 6, 7, 8, 9, 11, 13 k-means S Geodesic Distance Matrix MDS Source: Lecture18 Community Structure by Prof.Gunes
WSD: variable lexical notations for a concept • Generic concept notation: D = I ∪ J ∪ K ∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K) since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F) ¯¯¯¯ ¯ ¯ ¯¯¯¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept ¯ ¯ J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩ K) ) when J = fly, D = fish lure I = spinner k = troll And introducing boolean operators, AND for ∩ OR for ∪ NOT for ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept • (“fly”) becomes : (“fisherman's lure” OR “fish lure”) AND ( (NOT “spinner”) AND (NOT “troll”) ) then B = lure, E = ground bait, F = stool pigeon • (“fly”) becomes : (“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Thesaurus as a complex network • As a Directed Graph • sink composed of the 73,046 terms with kout = 0 • source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words • absolute source : without incoming links kin = 0 • normal source : (kout > 0 and kin > 0) • bridge source : without outgoing links to root words (kout(source) = 0) 1 – Normal source 2 – Bridge source 3 – Absolute source 4 – sink Source: arXiv:cond-mat/0312586 v1 2003
WSD: Semantic relatedness and word sense disambiguation • Concepts that occur more frequently and closer with each others are “more related” to each others than the concepts that appear less frequently and farther one Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WordNet Relationship • Semantic relatedness • Involves relationships among words • car-wheel (meronym) • hot-cold (antonym) • pencil-paper (functional) • penguin-antarctica (association) • Bank-trust company (synonym) • Probability and Distance calculation • Frequency of synsets or words • Performance in NLP applications
WordNet Relationship Browser WordNet Relationship Browser
WordNet Connect • Program to find all possible connections between two words in WordNet • Used in computing Semantic Opposition among word sense ontology • WordNet lexical database dictionary is used to read the semantic relations • Capabilities like number of paths, shortest path, overall network structure is studied
WordNet Connect WordNet Connect
WordNet Connect WordNet Connect
WordNet Connect WordNet Connect
Future work • WordNet structure in terms of complex network • Key assumptions • WordNet lexical dictionary analyzed under the scope of source node, target node with an additional reference node • Achieve a cost effective path which is conditionally related to mean reference node • Control the path traversal with a relation of focus • Include Common File Number to make it more efficient
Conclusion • A single visualization can not reveal the entire structure of wordnet • There are different ways of analyzing the effectiveness of the overall system • A new method to evaluate the usefullness of the WordNet network structure