230 likes | 250 Views
This study analyzes the topology of WordNet using various metrics like branching factor, depth versus height, and cluster coefficients. It delves into multiple inheritance and specificity examples, providing insights into the WordNet structure.
E N D
The Topology of WordNet:some metrics Ann Devitt and Carl Vogel Computational Linguistics Group Trinity College Dublin, Ireland
Introduction • Measures • WordNet “sub-hierarchies” • Multiple inheritance • Branching Factor • Depth versus Height • Cluster coefficients • Specificity pilot study Ann Devitt, TCD
Terminology • WordNet as directed acyclic graph • Node and synset interchangeable Ann Devitt, TCD
Dimensional distribution Ann Devitt, TCD
Overlap between hierarchies • 2072 synsets: more than 1 top hierarchy • 35 synsets: more than 2 top hierarchies Ann Devitt, TCD
Some overlap examples • Abstraction and Event • 948 synsets • group action • Entity and Group • 250 nodes • weaponry Ann Devitt, TCD
Multiple inheritance • 2.6% of nodes • Normal distribution throughout depth • Significantly different in different taxonomies: • χ2 (8, N=75180)=324.27, p≤0.001 Ann Devitt, TCD
Parents = 1, depth < 3 damnation office Parents = 1, depth > 8 beagle palomino Parents > 1, depth < 3 person artefact Parents > 1, depth > 8 sea bass self-condemnation bombardon Specificity examples Ann Devitt, TCD
Branching Factor • Number of children + 1 • Including leaf nodes • Range: 1 – 573 • Average: 2.023 • Excluding leaf nodes: • Average: 5.793 • 97% less than 20 Ann Devitt, TCD
Branching factor • Overall low branching factor • Same distribution in all sub-hierarchies • Large number of nodes in total • Greater overall depth in paths • Not a shallow structure • despite 55,000 leaf nodes Ann Devitt, TCD
Depth vs Height • Depth: • Maximum = 18 • Normal distribution • Height: • Maximum = 5 • 93.6% 1 or 2 nodes from a leaf node • Zipfian distribution Ann Devitt, TCD
Depth vs Height • Reported distributions • the same across the different sub hierarchies • Depth is a more informative measure Ann Devitt, TCD
Clustering coefficient • Measure of graph connectivity • Ratio: • Number of connections btwn nodes • Possible number of connections 2 Σi ki (ki – 1) Ann Devitt, TCD
Cluster coefficients • First-order measure • Not useful for WordNet • Only 62 nodes have a coefficient > 0 • Does not form clusters readily Ann Devitt, TCD
Cluster coefficients • Second-order measure • Average 0.337 • Normal distribution • May form clusters of wider diameter Ann Devitt, TCD
Pilot Study Aims • Do people have a notion of generality/specificity for concepts? • Do people agree on what is more/less general/specific? • What features of WordNet do these judgments correlate with? Ann Devitt, TCD
Sample ranking task I • Axis, axis of rotation – (the center around which something rotates • River boat – (a boat used on rivers or to ply a river) • Remains – (any object that is left unused or still extant; “I threw out the remains of my dinner” Ann Devitt, TCD
Sample ranking task II • rational motive - (a motive that can be defended by reasoning or logical argument • disapproval - (the act of disapproving or condemning) • harmony, concord, concordance - (agreement of opinions) Ann Devitt, TCD
Do people agree on what is more/less general/specific? YES • Cochran Q statistic (Cochran 1950) • H0 : that any agreement between respondents is due to chance • Overall: for 11 respondents • Cochran's Q 165.859 • 44 degrees of freedom • Asymp. Sig. .000 Ann Devitt, TCD
What WN features correlate? • Depth • Less deep = more general • Children • Inconclusive • Sisters • Less sisters = more general • Sub-hierarchy • Did not seem to affect judgments • Did increase the difficulty of the task Ann Devitt, TCD
Conclusion • WordNet metrics • Inheritance: Sub-hierarchy and parentage • Branching Factor • Distance: depth and height • Clustering • Pilot study • Suggests where to go with a larger study Ann Devitt, TCD
Bibliography • W. G. Cochran: The comparison of percentages in matched samples. Biometrika, 37:256-266, 1950 • David Touretsky: The Mathematics of Inheritance Systems, Los Altos, CA: Morgan Kaufmann (1986) • D. J. Watts and S. H. Strogatz: Collective dynamics of small world networks, Nature 401, 130 (1999) Ann Devitt, TCD
Multiple Inheritance vs Depth Ann Devitt, TCD