210 likes | 363 Views
Concept Hierarchy Induction. b y Philipp Cimiano p resented by Joseph Park. Concept Hierarchies. Structure information into categories Provide a level of generalization Form the backbone of any ontology. Common Approaches. Machine readable dictionaries Lexico -syntactic patterns
E N D
Concept Hierarchy Induction by Philipp Cimiano presented by Joseph Park
Concept Hierarchies • Structure information into categories • Provide a level of generalization • Form the backbone of any ontology
Common Approaches • Machine readable dictionaries • Lexico-syntactic patterns • Distributional similarity • Co-occurrence analysis
Machine readable dictionaries • Exploit regularity of dictionaries • Find a hypernym for the defined word • Head of the first NP (genus or kernel term) • spring "the season between winter and summer and in which leaves and flowers appear“ • hornbeam "a type of tree with a hard wood, sometimes used in hedges“ • launch "a large usu. motor-driven boat used for carrying people on rivers, lakes, harbors, etc."
Lexico-syntactic patterns • Hearst patterns • Hearstl: NP such as {NP,}* {(and | or)} NP • Hearst2: such NP as {NP,}* {(and | or)} NP • HearstS: NP {,NP}* {,} or other NP • Hearst4: NP {,NP}* {,} and other NP • Hearst5: NP including {NP,}* NP {(and | or)} NP • Hearst6: NP especially {NP,}* {(and|or)} NP • They should occur frequently and in many text genres • They should accurately indicate the relation of interest • They should be recognizable with little or no pre-encoded knowledge
Example of using hearst pattern • 'Such injuries as bruises, wounds and broken bones...' • hyponym(bruise, injury) • hyponym(wound, injury) • hyponym(broken bone, injury)
Distributional similarity • Distributional hypothesis • Words are similar to the extent they share the same context • ‘you shall know a word by the company it keeps’ –Firth
Co-occurrence analysis • Collocation • Document-based subsumption • a certain term is more special than a term if also appears in all the documents in which appears
Three More Approaches • Formal Concept Analysis (FCA) • Guided Clustering • Learning from heterogeneous sources of evidence
Formal Concept Analysis • Set-theoretical approach • Parse corpus (extract dependencies) • Verb-pp-complement • Verb-object • Verb-subject • Extract surface dependencies (section 4.1.4)
Guided Clustering • Uses hypernyms from WordNet and Hearst patterns
Heterogeneous sources of evidence • Naïve threshold classifier • Uses Hearst patterns for corpus patterns • Uses Google API for web patterns • Uses Hearst patterns over downloaded pages • Uses WordNet senses • Uses ‘head’-heuristic (r-match) • Uses corpus based subsumption • Uses document based subsumption