Introduction to ERGM/p* model

Introduction to ERGM/p* model Kayo Fujimoto, Ph.D. Based on presentation slides by Nosh Contractor and Mengxiao Zhu

Four parts of ERGM Observed network data Network statistics (or counts) of each configuration ERG Modeling Conditional probability and Change statistics Estimation and Simulation Estimate Parameters by Simulation Method: MCMC ML estimation Goodness of fit test (convergence t-test) Compare observed and simulated graphs Recent development in ERGM New model specification

Exponential Random Graph Model(ERGM) • ERGMs take the form of a probability distribution of graphs: • Y is a set of tie indicator variables Y • y is a realization, the observed network • g(y) is a vector of network statistics • θ is a parameter vector corresponding to g(y) • k(θ) is a normalizing factor calculated by summing up exp{θ’g(y)} over all possible network configurations

Observed network Graph statistics (or counts) of each configuration

Network Statistics Examplesfor Undirected Networks Example: Edge: 6 2-Star: 1+3+1+6+0=11 3-Star: 0+1+0+4+0=5 4-Star: 1 Triangle: 2 b c a e d

A Simple Example of ERGM Homogeneous Assumption Directed Network: Number of configurations: Undirected Network:

A Simple ERG model • Predict network using edge count • θcan take different values: • θ = 0, θ = -0.69, θ = 0.69 • L(y) can the following values: • L(y) = 0, L(y) = 1, L(y) = 2, L(y) = 3

Example 1: θ = 0, L=0 θ = 0 Probability of getting networks with 0 edge Model: ERGM Formula

Example 1: θ = 0 θ = 0 Model: ERGM Formula

Example 2: θ = -0.69 θ = -0.69 Model: ERGM Formula

Example 3: θ = 0.69 θ=0.69 θ = 0.69 Model: ERGM Formula

Why Change Statistics? Num of configurations: Huge Sample Space

ERG modeling Conditional Probability and Change Statistics

Conditional Probability vs. Total Probability • Total probability of the whole network • It is impossible to calculate when the size of the network gets large • Introduce the Conditional Probability of edges • Reduce sample space

Avoid the Calculation on Sample Space Conditional Probability of an Edge to exist Conditional Probability of an Edge to be absent is Logit p* model: model log odds ratio of Yij exists

Change Statistics (logit p* model) From the end of last slide, we have: Define Change Statistics as: Model log odds of a tie being present to absent:

Estimation and Simulation (Monte Carlo Markov Chain Maximum Likelihood Method)

Review: Maximum Likelihood Estimation (MLE) • Likelihood functions • Estimate parameter θgiven the observed network. • Maximum Likelihood Estimation • Find θvalues such that the observed statistics are equal to the expected statistics • Approximate MLE by simulation

Procedures for simulating ERG distribution • Markov Chain Monte Carlo Maximum Likelihood Estimation (MCMCMLE) • 1. Simulate a distribution of random graphs from a starting set of parameter values • 2. Refine the parameter values by comparing the distribution of graphs against the observed graph • 3. Repeat this process until the parameter estimate stabilize

Convergence T-statistics • Test adequacy of parameter values estimated • T-statistics for each configuration • T <|.1| good fit NOTE: If the parameter estimates do not converge, the model is degenerate

A Simple Example of MCMCMLE • Model: • Observed Network y: • Goal: Findθvalue such that the observed number of edges are equal to the expected number of edges

If θ can be chosen from the following 3 cases, θ=-0.69 is preferred because it gives the highest probability for the observed network • Given the observed Network y:

Markov dependence (Frank and Strauss, 1986) • Potential ties are dependent only if they share a common actor • Two possible network ties are conditionally independent unless they share a common actor • Once homogeneity assumption is imposed, we obtain the following configurations…

Markov random graph models(non-directed networks) Two-star(2) Density or edge() Triangle() Three-star(3)

Problems of degeneracy for Markov random models • Certain parameter values place almost all of the probability mass on either the empty or the fullgraph • Simulation studies showed that Markov random graph models are degenerate for many empirical networks with high level of clustering • A few very high degree nodes • Some regions of high triangulation

Two possibilities for the degeneracy problem (Snijders, et al 2006) • Makov dependence assumption may be too restrictive • The representation of transitivity by the total number of triangles might be too simplistic •  New specification of higher order network dependency

New development in ERGM Partial conditional dependence assumption and new model specification

Partial conditional dependence(Social circuit dependence) • Two possible network ties being conditionally dependent if their observation would lead to a 4-cycle i k = possible edges = observed edges j l

Partial conditional dependence(Example) Daughter B Daughter A Father B Father A

Difference between the two types of dependence assumptions Markov dependence assumptions Partial conditional dependence assumptions i k k i j l l j = potential tie = ties which affect the formation of the potential tie = ties with no effect on the potential tie

New Specifications of ERGM • Represent structural parameters similar to the Markov parameters • Effects are incorporated within the one configuration parameter • Three new statistics for non-directed network • Alternating k-stars • Alternating k-triangles • Alternating independent two-paths

Examples of new specifications • Alternating k-star configuration (degree dist’n): • Alternating k-triangle (tendency to form triads): • Alternating k-two-path (tendency to form cycles)

Interpretation of the parameter • Positive alternating k-star parameter • Networks with some higher degree nodes are highly probable.  Core-periphery structure • Positive alternating k-triangle parameter • Triangulation in the network as well as tendencies for triangles themselves group together in larger higher order “clump” • Positive alternating k-path parameter • Tendency for 4-cycles in the network

Summary for model construction • Random variables • Each network tie (Yij) among nodes of a network • A random tie variable Yij=1 if a tie form i to j exist, Yij=0 otherwise • yij the observed value of the variable Yij • Dependence assumptions • Define contingencies among network variables • Determine the type of parameters in the model • Ties also depends on node-level attributes (homophily) • Homogeneity assumption • Simplify parameters by imposing homogeneity constraints. • Estimation procedures • Find the best parameter values based on the observed network • Use simulation (MCMLE)

Software for ERGM • SIENA (Snijders, and colleagues) • PNet (Robbins, and colleagues) • Statnet (Butts, and colleagues)

Reference • Harrigan, Nicholas. “ Exponential Rnadom Graph (ERG) models and their application to the study of corporate elites. • Robins, Garry (manuscript). Exponential Random Graph (p*) models for social Networks, published in Melnet website. • Robins, G., Pattison, P. Kalish, y. Lusher, D. (2007). “An introduction to exponential random graph (p*) models for social networks”. Social Networks, 29, 173-191. • Snijders, T.A.B., Pattison, P., Robins, G, Hancock M. (2006). “New specifications for exponential random graph models. Sociological Methodology, 36: 99-153.

Thank you for your attention Any questions?

Introduction to ERGM/p* model

Introduction to ERGM/p* model

Presentation Transcript

OSI Model

Adiabatic formulation of the ECMWF model

What is a CGE Model?

Enhanced E-R Model and Business Rules

Introduction to the Social-Ecological Model of Health Module

Introduction to Model Order Reduction II.2 The Projection Framework Methods

The IS-LM model

WPF - Controls

Chapter 13 Application Analysis

雲端計算 Cloud Computing

Map/Reduce Programming Model

Models of Organizational Behavior

The Multiple Regression Model

Introduction to Acceptance and Commitment Therapy

Expectation-Maximization (EM) Algorithm

The IS-LM model

Signatures of alternative models beyond the Standard Model

Unit 1 Introduction to DBMS (Database Management Systems)

Model Oligopoli

Sector Model, by Hoyt

Discrete and Categorical Data