1 / 42

Graph theoretic analyses:

Large-Scale Organization of Semantic Networks Mark Steyvers Josh Tenenbaum Stanford University. Graph theoretic analyses:. Collaboration network of film actors, scientists Watts & Strogatz (1998); Newman (2001) Neural network of worm: C. elegans Watts & Strogatz (1998) WWW

martha-horn
Download Presentation

Graph theoretic analyses:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large-Scale Organization of Semantic Networks Mark Steyvers Josh Tenenbaum Stanford University

  2. Graph theoretic analyses: • Collaboration network of film actors, scientistsWatts & Strogatz (1998); Newman (2001) • Neural network of worm: C. elegans Watts & Strogatz (1998) • WWW Barabasi & Albert (1999)

  3. Overview • Link structure of semantic networks: Small-world & scale free • What produces such link structures? Semantic growth • Relation to age-of-acquisition effects • Behavioral effects of link structure

  4. Word AssociationNelson et al. (1999) nwords = 5,000+

  5. Roget’s Thesaurus Categories 1,000 Word forms 29,000+

  6. WordnetGeorge Miller Word senses 99,000+ Word forms 122,000+

  7. 3.03 10.61 5.43 2. Local clustering C = .186 .029 .875 Random Graphs .004 .000 .613 - - - 3 x number of triangles number of connected triples of vertices C=0 C=1 One class of Small World Networks: Word Association Roget WordNet 1. Short path lengths L = average length of 3.04 10.6 5.6 shortest path between two nodes n = number of nodes 5018 200,000+ 30,000+ 3. Power-law g = exponent in power-law 3.01 3.19 3.11degree distribution distribution

  8. Power law: Exponential: HUBS e.g., random graphs (Erdös-Réyni) orWatts & Strogatz (1998) model Power law tail Exponential tail

  9. g=3.01 g=3.19 g=3.11

  10. Slope in rank plot a=.466 Adamic (2000): g=1+1/a Slope in distribution plot g = 3.15 Zipf’s (1949) “Law of Meaning” #meanings Word frequency rank

  11. Overview • Link structure of semantic networks: Small-world & scale free • What produces such link structures? Semantic growth • Relation to age-of-acquisition effects • Behavioral effects of link structure

  12. H.A. Simon (1955). Power laws in distributions: • Scientists by number of papers published • Cities by population • Income by size -> “rich get richer” growth-like stochastic process • Barabasi et al. (1999). Power laws in WWW • in-degree & out-degree -> growth processes

  13. Proposal: Power-law degree distributions in semantic networks are signature of semantic growth • within individual; lexical development • across speakers; language evolution Disclaimer: We will not describe in detail any specific psychological mechanism

  14. Growing Network Model • Representation: • Nodes represent words or concepts • Edges represent semantic relations or associations • Variables: • ki = degree of node i • ui = utility of node i based on word frequency:

  15. Start with small fully connected network with M nodes • A new node is inserted: • Choose a local neighborhood i • (a neighborhood i of a node is formed by node i and its neighbors) • Make M connections into neighborhood • repeat n times until network is large enough

  16. Preferentially make M connections to nodes with high utility: 2 2.1 new node 3.4 .6 2.1 1.5 2.3 .2 Preferentially choose large neighborhoods: 1 2 2 new node 6 3 4 3 5 3 3 2

  17. Barabasi & Albert (1999) Model 5018 22 2.85 .020 2.83 Growing Network Model WordAssociation n 5018 5018 <k> 22 22 Path LengthL 3.04 2.84 (.04) ClusteringC .186 .185 (.007)coefficient Power-Lawg 3.01 2.86 (.077)coefficient

  18. Power-laws in non-growing semantic representations?

  19. Hawaii relax volcano lava soothe ache Convert LSA space to graph by variable thresholding on similarity measure • LSA: Latent Semantic Analysis • e.g., Landauer & Dumais (1997) • Analyzed co-occurrence statistics in a large corpus • Placed 60,000+ words in 300-dimensional space • Good semantic neighbors

  20. Tversky & Hutchinson (1986) • Low dimensional geometric models are not suitable for representing conceptual similarity relations; upper bound on the number of points that • can share the nearest neighbor

  21. No good semantic neighborhoods volcano -> was -> head -> ache (word association: volcano->hawaii->relax->soothe->ache) or tick -> tock -> made -> wonderful -> universe (word association: tick -> dog -> master -> universe) Ferrer & Solé (submitted): Connect two words if they co-occur within a small contextual window Slide window over large corpus

  22. Overview • Link structure of semantic networks: Small-world & scale free • What produces such link structures? Semantic growth • Relation to age-of-acquisition effects • Behavioral effects of link structure

  23. Age of acquisition (AoA) effects • Naming and lexical decision tasks • Carroll & White (1973); Brysbaert et al. (2000) • Locus of AoA effects? • Brown & Watson (1987); Lambon Ralph et al. (1998) • AoA is really cumulative frequency effect? • Lewis, Gerhand & Ellis (1999) • Need framework to understand AoA effects.

  24. t=1…15 t=16…50 t=51...150 Prediction of model: early acquired nodes have more connections. Do words acquired early in life have more connections?

  25. Language Evolution Words acquired early in English language are words with high degree (work in progress)

  26. Overview • Link structure of semantic networks: Small-world & scale free • What produces such link structures? Semantic growth • Relation to age-of-acquisition effects • Behavioral effects of link structure

  27. Naming and lexical decision latencies Behavioral effects of structural variables Degree-centrality centrality Authority (Eigenvector-centrality) Proposal: In cognitive system, search is biased toward facts, concepts or words with high centrality

  28. Semantic Dementia Hodges & Patterson (1995)

  29. Conclusion Conclusion • Link structure of semantic networks: • shows non-trivial patterns • shows signature of growth processes • “rich get richer” • respecting local neighborhoods • is relevant for search strategiescentral words might be searched first. • Paper will be available at www-psych.stanford.edu/~msteyver

  30. But… Early acquired words become more central in your model but maybe Words that are more central are acquired earlier

  31. Earliest year of quotation (in OED)vs.k (connectivity)

More Related