410 likes | 692 Views
Introduction to ERGM/p* model. Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu. Four parts of ERGM. Observed network data Network statistics (or counts) of each configuration ERG Modeling Conditional probability and Change statistics
E N D
Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu
Four parts of ERGM Observed network data Network statistics (or counts) of each configuration ERG Modeling Conditional probability and Change statistics Estimation and Simulation Estimate Parameters by Simulation Method: MCMC ML estimation Goodness of fit test (convergence t-test) Compare observed and simulated graphs Recent development in ERGM New model specification
Exponential Random Graph Model(ERGM) • ERGMs take the form of a probability distribution of graphs: • Y is a set of tie indicator variables Y • y is a realization, the observed network • g(y) is a vector of network statistics • θ is a parameter vector corresponding to g(y) • k(θ) is a normalizing factor calculated by summing up exp{θ’g(y)} over all possible network configurations
Observed network Graph statistics (or counts) of each configuration
Network Statistics Examplesfor Undirected Networks Example: Edge: 6 2-Star: 1+3+1+6+0=11 3-Star: 0+1+0+4+0=5 4-Star: 1 Triangle: 2 b c a e d
A Simple Example of ERGM Homogeneous Assumption Directed Network: Number of configurations: Undirected Network:
A Simple ERG model • Predict network using edge count • θcan take different values: • θ = 0, θ = -0.69, θ = 0.69 • L(y) can the following values: • L(y) = 0, L(y) = 1, L(y) = 2, L(y) = 3
Example 1: θ = 0, L=0 θ = 0 Probability of getting networks with 0 edge Model: ERGM Formula
Example 1: θ = 0, L=1 θ = 0 Probability of getting networks with 1 edge Model: ERGM Formula
Example 1: θ = 0, L=2 θ = 0 Probability of getting networks with 2 edge Model: ERGM Formula
Example 1: θ = 0, L=3 θ = 0 Probability of getting networks with 3 edge Model: ERGM Formula
Example 1: θ = 0 θ = 0 Model: ERGM Formula
Example 2: θ = -0.69 θ = -0.69 Model: ERGM Formula
Example 3: θ = 0.69 θ=0.69 θ = 0.69 Model: ERGM Formula
Why Change Statistics? Num of configurations: Huge Sample Space
ERG modeling Conditional Probability and Change Statistics
Conditional Probability vs. Total Probability • Total probability of the whole network • It is impossible to calculate when the size of the network gets large • Introduce the Conditional Probability of edges • Reduce sample space
Avoid the Calculation on Sample Space Conditional Probability of an Edge to exist Conditional Probability of an Edge to be absent is Logit p* model: model log odds ratio of Yij exists
Change Statistics (logit p* model) From the end of last slide, we have: Define Change Statistics as: Model log odds of a tie being present to absent:
Estimation and Simulation (Monte Carlo Markov Chain Maximum Likelihood Method)
Review: Maximum Likelihood Estimation (MLE) • Likelihood functions • Estimate parameter θgiven the observed network. • Maximum Likelihood Estimation • Find θvalues such that the observed statistics are equal to the expected statistics • Approximate MLE by simulation
Procedures for simulating ERG distribution • Markov Chain Monte Carlo Maximum Likelihood Estimation (MCMCMLE) • 1. Simulate a distribution of random graphs from a starting set of parameter values • 2. Refine the parameter values by comparing the distribution of graphs against the observed graph • 3. Repeat this process until the parameter estimate stabilize
Convergence T-statistics • Test adequacy of parameter values estimated • T-statistics for each configuration • T <|.1| good fit NOTE: If the parameter estimates do not converge, the model is degenerate
A Simple Example of MCMCMLE • Model: • Observed Network y: • Goal: Findθvalue such that the observed number of edges are equal to the expected number of edges
If θ can be chosen from the following 3 cases, θ=-0.69 is preferred because it gives the highest probability for the observed network • Given the observed Network y:
Markov dependence (Frank and Strauss, 1986) • Potential ties are dependent only if they share a common actor • Two possible network ties are conditionally independent unless they share a common actor • Once homogeneity assumption is imposed, we obtain the following configurations…
Markov random graph models(non-directed networks) Two-star(2) Density or edge() Triangle() Three-star(3)
Problems of degeneracy for Markov random models • Certain parameter values place almost all of the probability mass on either the empty or the fullgraph • Simulation studies showed that Markov random graph models are degenerate for many empirical networks with high level of clustering • A few very high degree nodes • Some regions of high triangulation
Two possibilities for the degeneracy problem (Snijders, et al 2006) • Makov dependence assumption may be too restrictive • The representation of transitivity by the total number of triangles might be too simplistic • New specification of higher order network dependency
New development in ERGM Partial conditional dependence assumption and new model specification
Partial conditional dependence(Social circuit dependence) • Two possible network ties being conditionally dependent if their observation would lead to a 4-cycle i k = possible edges = observed edges j l
Partial conditional dependence(Example) Daughter B Daughter A Father B Father A
Difference between the two types of dependence assumptions Markov dependence assumptions Partial conditional dependence assumptions i k k i j l l j = potential tie = ties which affect the formation of the potential tie = ties with no effect on the potential tie
New Specifications of ERGM • Represent structural parameters similar to the Markov parameters • Effects are incorporated within the one configuration parameter • Three new statistics for non-directed network • Alternating k-stars • Alternating k-triangles • Alternating independent two-paths
Examples of new specifications • Alternating k-star configuration (degree dist’n): • Alternating k-triangle (tendency to form triads): • Alternating k-two-path (tendency to form cycles)
Interpretation of the parameter • Positive alternating k-star parameter • Networks with some higher degree nodes are highly probable. Core-periphery structure • Positive alternating k-triangle parameter • Triangulation in the network as well as tendencies for triangles themselves group together in larger higher order “clump” • Positive alternating k-path parameter • Tendency for 4-cycles in the network
Summary for model construction • Random variables • Each network tie (Yij) among nodes of a network • A random tie variable Yij=1 if a tie form i to j exist, Yij=0 otherwise • yij the observed value of the variable Yij • Dependence assumptions • Define contingencies among network variables • Determine the type of parameters in the model • Ties also depends on node-level attributes (homophily) • Homogeneity assumption • Simplify parameters by imposing homogeneity constraints. • Estimation procedures • Find the best parameter values based on the observed network • Use simulation (MCMLE)
Software for ERGM • SIENA (Snijders, and colleagues) • PNet (Robbins, and colleagues) • Statnet (Butts, and colleagues)
Reference • Harrigan, Nicholas. “ Exponential Rnadom Graph (ERG) models and their application to the study of corporate elites. • Robins, Garry (manuscript). Exponential Random Graph (p*) models for social Networks, published in Melnet website. • Robins, G., Pattison, P. Kalish, y. Lusher, D. (2007). “An introduction to exponential random graph (p*) models for social networks”. Social Networks, 29, 173-191. • Snijders, T.A.B., Pattison, P., Robins, G, Hancock M. (2006). “New specifications for exponential random graph models. Sociological Methodology, 36: 99-153.
Thank you for your attention Any questions?