Phonological neighbors in a small world: What can graph theory tell us about word learning?

Phonological neighbors in a small world: What can graph theory tell us about word learning? Michael S. Vitevitch Department of Psychology University of Kansas NIH-NIDCD R03 DC 04259 NIH-NIDCD R01 DC 006472

Graph Theory • Graphically represent complex systems • Graph or Network • Vertices or Nodes • Edges or Links/Connections • Examples of systems • Diamond (crystals) • WWW • power grid • Interstate highway system

Ordered Graph

Random Graph

Graph Theory • Between ordered and random graphs are small-world graphs • Small path length (Six Degrees of Separation) • High clustering coefficient (relative to random) • “Probability” of my two friends being friends with each other. • 0 to 1

Movie Actors(Watts & Strogatz, 1998) • Large and complex system • 225,226 actors • Internet Movie Database circa 1998 • Node = Actor • Connection = Co-starred in a movie

Movie Actors • Despite large size, the network of actors exhibits small-world behavior • Average of 3 links between any two actors • Clustering coefficientrandom = .00027 • Clustering coefficientactors = .79 • Over 2000 times larger

Graph Theory • Some small-world graphs also are scale-free. • Degree = number of links per node • These systems exhibit interesting characteristics • Efficient processing • Development/growth • Robustness of the system to attack/failure

Graph Theory • A randomly connected network has a bell-shaped degree distribution. • Most nodes have the average number of links • Few nodes have more links than average • Few nodes have less links than average • Scale = stereotypical node (characterized by mean)

Graph Theory • In a network with a scale-free degree distribution there is no stereotypical node. • The degree-distribution follows a power-law • Many nodes have few connections • Few nodes have many connections

Scale-free network The power-law degree distribution of a scale-free network emerges as a result of: • Growth • New nodes are added to the system over time. • Preferential Attachment • New nodes tend to form links with nodes that are highly connected.

Implications of graph structure • The structure of a network constrains the processes that operate in the system and influences how well the system withstands damage.

Viewing the mental lexicon as a graph • Does the mental lexicon have a small-world structure? • Does the mental lexicon have a scale-free structure? • How does the structure of the mental lexicon influence various processes?

The mental lexicon as a graph • Nodes = 19,340 word-forms in database • Nusbaum, Pisoni & Davis (1984) • Links = phonologically related • One phoneme metric (Luce & Pisoni, 1998)

Links in the mental lexicon • One phoneme metric • cat has as neighbors • scat, at, hat, cut, can, etc. • dog is NOT a phonological neighbor of cat • Steyvers & Tenenbaum (2003) • Ferrer i Cancho & Solé (2001)

The mental lexicon as a graph • Small-world network • Relatively small path-length • Relatively high clustering coefficient • Scale-free topology • Power-law degree distribution • Growth • Preferential attachment

The mental lexicon as a graph Pajek (Batagelj & Mrvar, 2004) • Program for analysis and visualization of large networks

Graph of Adult Lexicon

The mental lexicon as a graph

Path-length Average distance between two nodes • cat-mat-mass-mouse •  = 6.05

Diameter Longest path length • D = 29 • connect & rehearsal • connect, collect, elect, affect, effect, infect, insect, inset, insert, inert, inurn, epergne, spurn, spin, sin, sieve, live, liver, lever, leva, leaven, heaven, haven, raven, riven, rivet, revert, reverse, rehearse, rehearsal

Clustering Coefficient A small-world network has a larger clustering coefficient (by orders of magnitude) than a random network • C = .045 • Over 250 times greater

The mental lexicon is asmall-world network • Relatively short path length • Large clustering coefficient

Does the mental lexicon have a scale-free topology? • A degree distribution that follows a power law. • Growth • Preferential attachment

Power-law degree distribution

Power-law degree distribution Degree distribution for the lexicon:  = 1.96 (approaching 2 < < 3)

Growth & Preferential Attachment • Growth • Children (and adults) learn new words • New words are added to the language over time • Preferential attachment

Preferential Attachment • Words that are added to the lexicon early in life should have more links than words that are added to the lexicon latter in life. • Storkel (2004) • Relationship between AoA and Density

Preferential Attachment • Phonological neighborhoods should become “denser” over time. • Charles-Luce and Luce (1990, 1995) • Analyzed words in adults and children 5- and 7-years old. • Neighborhood density for words in the adult lexicon were denser than the neighborhood density for those same words in the 5- and 7-years old lexicons.

Preferential Attachment • Words with denser neighborhoods should be easier to learn/acquire. • Storkel (2001, 2003) • Pre-school age children learned novel words that had common sound sequences/dense neighborhoods more rapidly than novel words that had rare sound sequences/sparse neighborhoods. • Adults, too (Storkel, Armbrüster & Hogan, submitted)

Advantage of this structure Topological robustness • Damage does not result in catastrophic failure

Topological robustness • Damage tends to affect less connected nodes. • Hubs maintain integrity of the whole system. • Even if a hub is damaged, the presence of other hubs will absorb the extra processing load. • Only if everynode has been damaged will a scale-free network catastrophically fail.

Topological robustness in the mental lexicon • Speech production errors occur more for words with sparse than dense neighborhoods. • Vitevitch (1997; 2002) and Vitevitch & Sommers (2003) • Same pattern for errors in patients with aphasia • Gordon & Dell (2001) • More errors in STM for words with sparse than dense neighborhoods. • Roodenrys et al. (2002)

Mental lexicon as a scale-free network • The present analysis suggests the lexicon has a scale-free topology. • Evidence from several areas is consistent with predictions derived from a scale-free lexicon.

Do the characteristics of graph theory have any psychological reality?

Psychological reality of graph theory • k (degree) = Neighborhood Density • Luce & Pisoni (1998) • Vitevitch (2002) • Clustering Coefficient • Probability of two neighbors of a word being neighbors with each other.

Clustering Coefficient Experiment • Auditory Lexical Decision Task (n = 57) • Words varying in clustering coefficient • Frequency • Familiarity • Neighborhood Density • NHF • Phonotactic Probability • Onsets

hive wise

Clustering Coefficient Experiment

Clustering Coefficient Experiment In spoken word recognition • k (degree), neighborhood density • Words with sparse neighborhoods are responded to more quickly than words with dense neighborhoods. • Clustering Coefficient • Words with high CC are responded to more quickly than words with low CC.

When does a scale-free lexicon emerge?

When does a scale-free lexicon emerge? • Traditional benchmark for “vocabulary spurt” is 50 words (about 18 mo.) • (e.g., Goldfield & Reznick, 1996; Mervis & Bertrand, 1995). • Various mechanisms have been proposed for the vocabulary spurt • (e.g., Golinkoff et al. 2000; Nazzi & Bertoncini, 2003).

MacArthur Communicative Development Inventory (CDI) Estimate known words in 16-30 m.o. children • The earliest age at which 50% of the children knew a given word at a particular age. • 16, 18, 19, 30 months of old

Network Statistics

Emergence of a scale-free lexicon “Vocabulary spurt” is often observed: • 18- to 19-months of age • 50-words • Signals reorganization in the lexicon.

Emergence of a scale-free lexicon A scale-free network emerged at the same age/developmental milestone. • This may lead to highly efficient word learning and language processing.

Emergence of a scale-free lexicon Variability in age/vocabulary size associated with this developmental milestone may be due to different initial starting states. • The first few sound patterns that are learned may play a large role in determining how easily subsequent words are acquired. • Mandel, Jusczyk & Pisoni (1995)

Phonological neighbors in a small world: What can graph theory tell us about word learning?