840 likes | 1.09k Views
SI/EECS 767 Yang Liu January 29, 2010. Language networks. Human language described as a complex network. Introduction. (Sole et al, 05). Analyzing statistical properties Building models to explain the patterns Studying the origins and evolution of human language
E N D
SI/EECS 767 Yang Liu January 29, 2010 Language networks
Human language described as a complex network Introduction (Sole et al, 05)
Analyzing statistical properties Building models to explain the patterns Studying the origins and evolution of human language Statistical approaches to natural language processing Incentives
Words as vertices • Co-occurrence networks • (Dorogovtsev & Mendes, 2001; Masucci & Rodgers, 2008) • Semantic networks • (Steyvers, Tenenbaum, 2005) • Syntactic networks • (Cancho et al., 2004) • Sentences as vertices • (Erkan & Radev, 2004) • Documents as vertices • (Menzer, 2004) Categorization
Language as an evolving word web (Dorogovtsev & Mendes, 2001)
Propose a theory of how language evolves Treat human language as a complex network of distinct words Words are connected with nearest neighbors (co-occurrence networks) Papers of Ferrer & Sole (2001, 2002) degree distribution consists of two power-law parts with different exponent Introduction
Preferential attachment • Provide power-law degree distribution • Average degree does not change • The total number of connections increases more rapidly than the number of vertices and the average degree grows The model
At each time step, • a new vertex (word) is added; • t is the total number of vertices, plays the role of time; • connect it with some old one i with the probability proportional to its degree ki; • ct new edges emerge between old words (c is a constant coefficient) • These new edges emerge between vertices i and j with the p ~ kikj The model
Two word webs by Ferrer and Sole (2001, 2002) Obatain ¾ of a million words of the British National Corpus 470 000 vertices Average degree = 72 data
Continuum approximation k(s,t) : the average degree of the vertices born at time s and observed at time t Ct ≈ 70 >>1 Solving the model
The degree distribution has two regions separated by the crossover point Solving the model
Below this point, stationary degree distribution Above this point, Non-stationanry degree distribution Solving the model Empty and filled circles show the degree distributions for two word webs by Ferrer and Sole (2001, 2002)
Interested only in degree distribution Clustering coefficients not match The total number of words of degree greater than kcross does not change The size of kernel lexicon does not depend on the total number of distinct words in language discussion
Network properties of written human language (Masucci & Rodgers, 2008)
The words (include punctuations) are vertices and two vertices are linked if they are neighbors. Directed network Topology of the network
8992 vertices, 117687 edges, mean degree <k> = 13.1 P(k) ∝k -1.9 Zipf’s law slope -1.2 Network statistics
The number of edges between words grows faster than the number of vertices. N(t) ∝ t 1.8 Growth properties
The mean cluster coefficient <c> = 0.19 Nearest neighbor’s properties
Repeated binary structures of words Reproduce by local PA
Starts with a chain of 20 connected vertices At each time add a new vertex and connect it to some vertex i with p ∝ ki m(t) -1 new edges emerge between old words with p ∝ kikj The models (d-m model)
<c(k)> = 0.16 Catches the average clustering and the global growth behavior Misses the internal structure D-m model
Include local PA P(t) ≈ 0.1t0.16 Start with a chain of 20 connected vertices At each time add a new vertex and connect it to some vertex i (not nearest neighbors) with p ∝ ki m(t) -1 times, with probability p(t) link the last vertex to an old vertex i (in its nearest neighborhood) through local PA (p ∝ ki); with 1 – p(t), link an old vertex i (not part of its nearest neighborhood) with global PA Model 2
<c> = 0.08 Catches the global and nearest neighbor behavior but not the average cluster coeffient Model 2
Different words in written human language display different statistical distributions, according to their functions Model 3
Start with a chain of 20 connected vertices At each time add a new vertex and connect it to some vertex i (not nearest neighbors) with p ∝ ki m(t) -1 times, with probability q= 0.05, link the last linked vertex to one of the three fixed vertices; with probability p(t) link the last vertex to an old vertex i (in its nearest neighborhood) through local PA (p ∝ ki); with 1 – p(t) – 3q, link an old vertex i (not part of its nearest neighborhood) with global PA. Model 3
<c> = 0.20 Model 3
New growth mechanisms: 1.local PA 2.the allocation of a set of preselected vertices Conclusions
The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth (Steyvers & Tenebaum, 2005)
There are general principles governing the structure of network representations for natural language semantics The small-world structure arise from a scale-free organization Introduction
Concepts enter the network early are expected to show higher connectivity One aspect of semantic development – growth of semantic networks by differentiations of existing nodes The model grows through a process of differentiation analogous to mechanisms of mechanic development which allows it to produce both small-world and scale-free structure. model
Free association norms WordNet Roget’s thesaurus Analysis of semantic networks
Associative networks • Created two networks: directed, undirected methods
Bipartite graph • Word nodes and semantic category nodes • A connection is made between a word and category node when the word falls into the semantic category • Convert to a simple graph for calculating cc ( one-mode projection) Roget’s thesaurus
120,000+ word forms 99,000+ word meanings Links between forms and forms, meanings and meaning, forms and meanings Treat as an undirected graph wordnet
Previous models • BA model: low cc • WS model: no scale-free structure Growing network model
At each time step, a new node with M links is added to the network by randomly choosing some existing node i for differentiation, and then connecting the new node to M randomly chosen nodes in the semantic neighborhood of node i. Model A: undirected
Set n equal to the size of the target network Set M equal to ½ <k>
Assume the direction of each arc is chosen randomly and independently of the other arcs Point toward old node with probability α, point toward new node with probability1-α Model b: directed
Only test on association networks with Model A and B set α = 0.95 Average of 50 simulations results
Patterns in syntactic dependency networks (Ferrer et al., 2004)
Co-occurrence networks fail in capturing the characteristic long-distance correlations of words in sentences The proportion of incorrect syntactic dependency links is high Require a precise definition of syntactic link introduction
Defined according to the dependency grammar formalism Vertices are words, links go from the modifier to its head The syntactic dependency network