320 likes | 438 Views
WORDNET. Approach on word sense techniques. - AKILAN VELMURUGAN. What is WORDNET. Machine readable semantic dictionary interlinked by semantic relations Developed by PRINCETON University Large lexical database for English language
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN
What is WORDNET • Machine readable semantic dictionary interlinked by semantic relations • Developed by PRINCETON University • Large lexical database for English language • Language forms a scale free network with small average shortest path having words as nodes and concepts as links source: http://wordnet.princeton.edu/
Use of wordnet • Easily navigable • Used as online dictionary for English • Freely for public availability • structure to show relations in the form of • - noun, verb, adjective, adverb • - synonymn • - hypernym (Is a kind of …) • - hyponym (… is a kind of) • - troponym (particular ways to …) • - meronym (parts of . . .) • WORDNET Application source: http://wordnet.princeton.edu/
Few representations of WORDNET • Schema representation • Graph Theory • Tree structure • Force graph structure • wordnet explorer • Visual Interface for wordnet
Using RDF Schema and OWL ontology • Wordnet classes and properties are represented as wn:word and wn:wordsense Source: www.w3.org/.../WNET/wordnet-sw-20040713.html
Represented using Graph theory can be directed or un-directed graph Source: www. nodebox.net/code/index.php/Graph
Represented using Tree sturucture • uses tokens and lexical relations Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html
Represented using Force Graph Structure • Presentation of words and meanings as graph nodes, and relations as edges between them Source: www. code.google.com/p/synonym/
Represented for WORDNET Explorer • For applying visual principles to Lexical semantics Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm
Background study on wordsense • word ontology • Word Sense Disambiguation • Variable lexical notation for a concept • i-level generic notation • i-level specific notation • Semantic relatedness in WSD • Experiment Results • Thesaurus as a complex network • Visual Interface for wordnet Flow of study WORDNET – synsets – word ontology – set algebra – rules for representing lexical notations – semantic relatedness between concepts – concept distribution statistics – Degree of semantic relatedness :: WSD – Word Sense Disambiguation – semcor – Test cases – WSD on a complex network – WSD in English Thesaurus – Future work Source: http://kylescholz.com/projects/wordnet
Wordnet – common sense ontology • Symbols are words • Concept meanings are synsets • Represented by one or more wods • Words used for representation: synonymns • Synonyms and polysemous word • Synset comprises a list of words and a list of semantic relations between other sysnsets. • Part I – list of words each one with a list of synsets that the word represents • Part II – set of semantic relations between synsets(is-a, part-of, substance-of, member-of)
WSD: variable lexical notations for a concept • Generic concept notation: D = I ∪ J ∪ K ∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K) since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F) ¯¯¯¯ ¯ ¯ ¯¯¯¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩ K) ) when J = fly, D = fish lure I = spinner k = troll And introducing boolean operators, AND for ∩ OR for ∪ NOT for ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ ¯ Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept • (“fly”) becomes : (“fisherman's lure” OR “fish lure”) AND ( (NOT “spinner”) AND (NOT “troll”) ) then B = lure, E = ground bait, F = stool pigeon • (“fly”) becomes : (“bait” OR “decoy” OR “lure”) AND ( ((NOT “ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Notation for synset • i-level generic notation for a synset If Sk is a synset, Fi is the synset that is located i links away following the hypernym links from Sk then the i-level generic notation for Sk is: • Note: Fi is the parent node of Fi-1, Fi-1 is the parent node of Fi-2 … • i-level specific notation for a synset J = P ∪Q∪ R when, P = T Q = U R = V∪ W ∴J = T ∪ U ∪(V ∪W) If S is a synset, Li is the set of synsets, Cik that are located i links away following the hyponym links from S, then the i-level specific regular notation for S is: • Note: if Cik is null, then C(i-1)k would be used (C(i-1)k is a leaf node in the case) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Semantic relatedness and word sense disambiguation • Procedure for determining the semantic relatedness of two given wordnet synsets • Conception 1: Concepts that appear more frequently and closer with each others are "more related" to each others than the concepts that appear less frequently and farther are. Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Semantic relatedness and word sense disambiguation Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Tested for four random texts i-level generic notation ( 1, 2, 3 ) Size of windows of context: Target words Vs Context words ( 3, 5, 7 ) Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Thesaurus as a complex network As a Directed Graph: • sink composed of the 73,046 terms with kout = 0 • source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words • absolute source : without incoming links kin = 0 • normal source : (kout > 0 and kin > 0) • bridge source : without outgoing links to root words (kout(source) = 0) 1 – Normal source 2 – Bridge source 3 – Absolute source 4 – sink Source: arXiv:cond-mat/0312586 v1 2003
Thesaurus as a complex network Frequency of outgoing links Frequency of incoming links Source: arXiv:cond-mat/0312586 v1 2003
Thesaurus as a complex network Incoming Vs Outgoing Frequency Frequency distribution • Kout – for root words • Kin – for all words • - Root words in Kout • - All words in Kin • - Root words in Kin • - Non root words in Kin
Extension of wordnet • Transforming a Tree structure to a Matrix structure • Wordnet in other languages (japanese, korean, Thai) • Imagenet interlinked with wordnet • REBUILDER – a repository of software designs • Retrieves using bayesian network and wordnet