450 likes | 577 Views
SIMS 290-2: Applied Natural Language Processing. Marti Hearst October 25, 2004. Next Few Classes. This week: lexicons and ontologies Today: WordNet’s structure, computing term similarity Wed: Guest lecture: Prof. Charles Fillmore on FrameNet Next week: Enron labeling in class
E N D
SIMS 290-2: Applied Natural Language Processing Marti Hearst October 25, 2004
Next Few Classes • This week: lexicons and ontologies • Today: • WordNet’s structure, computing term similarity • Wed: • Guest lecture: Prof. Charles Fillmore on FrameNet • Next week: Enron labeling in class • The entire assignment will be due on Nov 15 • Following week: Question-Answering
Text Categorization Assignment • Great job, you learned a lot! • Comparing to a baseline • Selecting features • Comparing relative usefulness of features • Training, testing, cross-validation • I learned a lot too! (from your results) • (I’ll send you your feedback today)
Text Categorization Assignment • Features • Boosting weights of terms in subject line is helpful. • Stemming does help in some circumstances (often works well with SVM, for example), but not always. • Counter-intuitively, stemming can increase the number of features in our implementation, because it increases how many terms pass the minimum-document-occurrence cutoff. • An example of the porter stemmer not hiding differences when it otherwise would: converting gaseous to "gase" and so not conflating "gas" for fuel for motorcycles with "gaseous" for the science group.
Text Categorization Assignment • Features • Terms with more than just the default alphabetical terms are helpful, maybe because in part getting the domain name information, but also because of getting technical terms. • It's probably best to use the Weka feature selector to tell you what *kind* of features are performing well, but not to select those for use exclusively. • I'm surprised that no one tried bigrams or noun-noun compounds as features.
Text Categorization Assignment • Feature Weighting • Tf.idf: Almost everyone who tried it found it was raw term frequency (there were exceptions). • Binary feature weights with document count minimum thresholds can be a good substitute. • An interesting variation on tf.idf is to do it in a class-based manner. • weight terms higher that only occur in one class vs. the others. • A couple of students tried this and got good results on the diverse comparison, but less good on the homogenous. This makes sense since the measure would not help as much in distinguishing similar newsgroups that share many terms.
Text Categorization Assignment • Classifiers • Naïve-Bayes Multinomial was a clear winner • SVM worked well most of the time, but not as well as NBM • Naive Bayes seemed to be more robust to unseen information; the kernel estimator seems to improve the default Naive Bayes settings. • VotedPerceptron worked very well, but only does binary classification so people who found it did very well on diverse did not transfer it to homogenous.
Today • Lexicons, Semantic Nets and Ontologies • The Structure of WordNet • Computing Similarities • Automatic Acquisition of New Terms
Lexicons, Semantic Nets, and Ontologies • Lexicons are (typically) word lists augmented with some subset of: • Parts-of-speech • Different word senses • Synonyms • Semantic Nets • Include links to other terms • IS-A, Part-Of, etc. • Sometimes this term is used for what I call ontologies • Ontologies • Represent concepts and relationships among concepts • Language independent (in principle) • Sometimes include inference rules • Different from definition in philosophy • The science of what is, of the kinds and structures of objects, properties, events, processes and relations in every area of reality
Language A Proprietary Terminologies L & C LanguageB Lexicon Lexicon Others ... Grammar ICPC Grammar SNOMED ICD LinC Base One approach to linking ontologies and lexicons Formal Domain Ontology Cassandra Linguistic Ontology MEDRA Adapted from slide by W. Ceusters, www.landc.be
HAS-SPATIAL-POINT-REFERENCE HAS-CONNECTING-REGION HAS-OVERLAPPING-REGION IS-SPATIAL-PART-OF HAS-DISCRETED-REGION HAS-SPATIAL-PART HAS-DISCONNECTED-REGION IS-PROPER-SPAT.-PART-OF IS-INSIDE-CONVEX-HULL-OF HAS-PROPER-SPATIAL-PART IS-PARTLY-IN-CONVEX-HULL-OF IS-OUTSIDE-CONVEX-HULL-OF HAS-EXTERNAL-CONNECTING-REGION IS-NON-TANG.-SPAT.-PART-OF IS-TANG.-SPAT.-PART-OF IS-TOPO-INSIDE-OF IS-GEO-INSIDE-OF IS-SPAT.-EQUIV.-OF HAS-NON-TANG.-SPAT.-PART HAS-TANG.-SPAT.-PART Example Ontological Relation Types HAS-PARTIAL-SPATIAL-OVERLAP Adapted from slide by W. Ceusters, www.landc.be
Example of applying an ontology: joint anatomy • joint HAS-HOLE joint space • joint capsule IS-OUTER-LAYER-OF joint • meniscus • IS-INCOMPLETE-FILLER-OF joint space • IS-TOPO-INSIDE joint capsule • IS-NON-TANGENTIAL-MATERIAL-PART-OF joint • joint • IS-CONNECTOR-OF bone X • IS-CONNECTOR-OF bone Y • synovia • IS-INCOMPLETE-FILLER-OF joint space • synovial membrane IS-BONAFIDE-BOUNDARY-OF joint space This doesn’t include the linguistic side Adapted from slide by W. Ceusters, www.landc.be
Linking Lexicons and Ontologies Generalised Possession Healthcare phenomenon Human Has- possessor Has- possessed IS-A 1 1 2 1 Having a healthcare phenomenon IS-A 2 Is-possessor-of Patient Is-Risk- Factor-Of IS-A IS-A 4 3 Has-Healthcare-phenomenon 4 3 Patient at risk Risk Factor IS-A IS-A IS-A Has-Healthcare-phenomenon Is-Risk- Factor-Of Patient at risk for osteoporosis Risk factor for osteoporosis Osteoporosis Adapted from slide by W. Ceusters, www.landc.be
Linking different lexicons Snomed-RT : “Convulsion” MESH-2001 : “Seizures” ISA IS-narrower-than Snomed-RT : “Seizure” MESH-2001 : “Convulsions” Has-CCC Has-CCC Has-CCC Has-CCC L&C : Health crisis IS-A IS-A L&C : Seizure L&C : Convulsion IS-A IS-A L&C : Epileptic convulsion Adapted from slide by W. Ceusters, www.landc.be
WordNet • A big lexicon with properties of a semantic net • Started as a language project by Dr George Miller and Dr. Christiane Fellbaum at Princeton • First became available in 1990 • Now on version 2.0
WordNet • Huge amounts of research (and products) use it
WordNet Relations • Original core relations: • Synonymy • Polysemy • Metonymy • Hyponymy/Hyperonymy • Meronymy • Antonymy • New, useful additions for NLP • Glosses • Links between derivationally and semantically related noun/verb pairs. • Domain/topical terms • Groups of similar verbs • Others on the way • Disambiguation of terms in glosses • Topical clustering.
Different ways of expressing related concepts Examples cat, feline, Siamese cat Synonyms are almost never truly substitutable: Used in different contexts Have different implications This is a point of contention. Synonymy
Most words have more than one sense Homonym: same word, different meaning bank (river) bank (financial) Polysemy: different senses of same word That dog has floppy ears. She has a good ear for jazz. bank (financial) has several related senses the building, the institution, the notion of where money is stored Polysemy
Use one aspect of something to stand for the whole The building stands for the institution of the bank. Newscast: “The White House released new figures today.” Waitperson: “The ham sandwich spilled his drink.” Metonymy
Hyponymy/Hyperonymy • ISA relation • Related to Superordinate and Subordinate level categories • hyponym(robin,bird) • hyponym(bird,animal) • hyponym(emu,bird) • A is a hypernym of B if B is a type of A • A is a hyponym of B if A is a type of B
Meronymy • Parts-of relation • part of(beak, bird) • part of(bark, tree) • Transitive conceptually but not lexically: • The knob is a part of the door. • The door is a part of the house. • ? The knob is a part of the house ?
Antonymy • Lexical opposites • antonym(large, small) • antonym(big, small) • antonym(big, little) • but notlarge, little • Many antonymous relations can be reliably detected by looking for statistical correlations in large text collections. (Justeson &Katz 91)
Using WordNet in Python from wordnet import * from wntools import *
Using WordNet in Python from wordnet import * from wntools import *
Using WordNet to Determine Similarity • The “meet” function in the python wordnet tool finds the closest common parent to two terms
Similarity by Path Length • Count the edges (is-a links) between two concepts and scale • Leacock and Chodorow, 1998 • lch(c1,c2) = -log [(length(c1,c2) / 2 * max-depth] • Wu and Palmer, 1994 • wup(c1,c2) = 2 * depth(lcs(c1,c2)) / [depth (c1) + depth (c2)]
Problems with Path Length • The lengths of the paths are irregular across the hierarchies • Words might not be in the same hierarchies that should be • How to relate terms that are not in the same hierarchies? • The “tennis problem”: • Player • Racquet • Ball • Net • Are all in separate hierarchies • WordNet is working on developing such linkages
Similarity by Information Content • IC estimated from a corpus of text (Resnik, 1995) • IC(concept) = -log(P(concept)) • Specific Concept • High IC (pitchfork) • General Concept • Low IC (instrument) • To estimate it: • Count occurrences of “concept” • Given a word, increment count of all concepts associated with that word • increment bank as financial institution and also as river shore. • Assume that senses occur uniformly lacking evidence to the contrary (e.g., sense tagged text) • Counts propagate up the hierarchy
Information Content as Similarity • Resnik, 1995 • res(c1,c2) = IC (lcs (c1,c2)) • Jiang and Conrath, 1997 • jcn(c1,c2) = 1 / [2*res(c1,c2) – (IC (c1) + IC(c2))] • Lin, 1998 • lin(c1,c2) = 2*res(c1,c2) / [IC(c1) + IC(c2)] • All of these (and more!) are implemented in a perl package • Called SenseRelate, Pedersen et al. • http://wn-similarity.sourceforge.net/
Rearranging WordNet • Try to fix the top-level hierarchies • Parse the glosses for more information eXtended WordNet project http://xwn.hlt.utdallas.edu/
Augmenting WordNet • Lexico-syntactic Patterns (Hearst 92, 97)
Augmenting WordNet • Lexico-syntactic Patterns (Hearst 92, 97)
Acquisition using the Web • Towards Terascale Knowledge Acquisition, Pantel and Lin ’04 • Use co-occurrence model and a huge collection (the Web) to find similar terms • Input: a cluster of related words • Feature vectors computed for each word • Catch ___ • Compute mutual information between the word and the context • “Average” the features for each class to create a grammatical template for each class
Acquisition using the Web Use this template to find new examples of this class of terms (but it makes many errors)
Next Time • FrameNet • A background paper is on the class website • (Not required to read it beforehand)