1 / 18

What's the topic for this lecture?

What's the topic for this lecture?. Introduction to the use of a multivariate modeling framework for network data Exponential random graph models (ERGM also known as P*-models) The software used in this presentation is the “Statnet” package in R

holly-moore
Download Presentation

What's the topic for this lecture?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What's the topic for this lecture? • Introduction to the use of a multivariate modeling framework for network data • Exponential random graph models (ERGM also known as P*-models) • The software used in this presentation is the “Statnet” package in R • Information on Statnet is available in a special Volume of Journal of Statistical Software • ”Statistical Modeling of Social Networks with "statnet”, Vol 24, no. 1-9 (2008) Journal of Statistical Software

  2. Why use multivariate statistics? • A phenomena must sometimes be explained with more than one variable • This is particularly true for social phenomena • In a multivariate analysis we combine different independent variables in order to predict the values on a dependent variable • In other words, we construct a model where we assume a causal relationship between the variable we want to explain, and a set of variables that our theory says causes the phenomena

  3. Example of a multivariate model • We want to explain income variation in an organisation • Annual salary is our dependent variable (Y) • What explains income? • Number of years that the individual has been employed (X1) • Education (X2) • Position in organisation (X3) • Gender (X4)

  4. Bivariate or multivariate models • If the relationship the variables is linear: • The bivariate linear model • Y = a + bX • The multivariate linear model • Y = a + b1X1 + b2X2 + b3X3 + b4X4

  5. How to interpret a multivariate model • A multivariate model is not a series of bivariate relationships calculated at the same time • i.e. it is not Y = a + b1X1 OR Y = a + b2X2 OR Y = a + b3X3 • In a multivariate model we calculate the partial effects of a independent variable, i.e. its unique contribution to the model • X1 has an partial effect on Y when X2 and X3 are constant

  6. Why use a ERGM? • Test hypothesis about the processes that generate a particular network structure • A ERG model can be estimated with logistic regression analysis • Our goal is to build a model that can predict links between nodes in the network

  7. Logistic regression • Dependent variable in is binary (1 or 0) • What is estimated in a logistic regression is the logodds for the dependent variable • A logodds can be written as a linear function: • Ln(P(Y=1)/(1-P(Y=1)) = b + X1 + X2 + X3 • The logistic regression is estimated with MLE • MLE is a algorithm that finds estimates (i.e. b + X1 + X2 + X3) that maximises the likelihood of the model

  8. Homophily theory • Lazarsfeld and Merton (1964) • Most human communication will occur between a source and a receiver who are alike • When individuals share common meanings, belief, and mutual understandings, communication between them is more likely to be effective. • Gender homophily in organisations has been observed in many studies • Our hypothesis is that gender homophily is a salient factor in research collaboration

  9. Does gender structure co-authorship networks? • Data: publication from department of psychology, Umeå University • Publication year 2007-> • Number of published items = 51 articles • Number of authors = 114 (male = 64, female = 50) • Number of authors employed at the psychology department = 24 (male =16, female = 8)

  10. The co-authorship network

  11. The ERG model • Dependent variable is binary (Y = 1 if there is a co-authorship link between the authors) • We will build a model that tries to predict the existence of co-authorship links • We will use a set of node attributes as independent variables • It is also possible to use edge attributes in a ERG model

  12. Independent variable and hypothesis • Node attributes • Number of authorships for each node • Employed at the department (1 = employed, 0 = not employed) • Gender (1= female, 0 = male) • The hypothesis • Co-authorships are effected by gender homophily, i.e. links is more probable if the authors have the same sex • Two types of homophily, baseline and inbreeding homophily • We will estimate the effect of inbreeding homophily

  13. Intercept (baseline) model • Intercept model ========================== Summary of model fit ========================== Formula: psyk ~ edges Maximum Likelihood Results: Estimate Std. Error MCMC s.e. p-value edges -3.06538 0.06039 NA <1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 For this model, the pseudolikelihood is the same as the likelihood.

  14. Interpretation of the intercept • -3.06538 is the logodds for the existence of a co-authorship link • i.e ln(P(Y=1)/(1-P(Y=1)) • This is the same as connectivity of the network, i.e. the number of existing links divided by the number of possible links • The interpretation of the intercept changes when we introduce our independent variables

  15. Model 2 • Formula in R console: psyk ~ edges + nodematch(“gender“, diff = FALSE) + nodefactor(“gender") + nodematch(“department“, diff = FALSE) + Nodefactor(“department") + nodecov("production") • Terms used in formula: • Edges is the intercept term in the model • Nodefactor() returns main effect of a categorical attribute • Nodematch(, diff = FALSE) Uniform homophily • If diff = TRUE we get differential homophily • Nodecov() main effect of a numeric attribute

  16. Model 2 Maximum Likelihood Results: Estimate Std. Error MCMC s.e. p-value edges -3.94783 0.22617 NA <1e-04 *** nodematch.gender 0.12807 0.12478 NA 0.3048 nodefactor.gender.1 0.18282 0.09190 NA 0.0467 * nodematch.dept -0.25089 0.16747 NA 0.1342 nodefactor.dept.1 -0.13087 0.14687 NA 0.3729 nodecov.prod 0.21710 0.02113 NA <1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 For this model, the pseudolikelihood is the same as the likelihood.

  17. Model with a combined gender-department node attribute Maximum Likelihood Results: Estimate Std. Error MCMC s.e. p-value edges -3.780522 0.318871 NA <1e-04 *** nodematch.gendept.1 -0.470124 0.427773 NA 0.272 nodematch.gendept.2 0.809230 0.821364 NA 0.325 nodematch.gendept.3 0.042986 0.278974 NA 0.878 nodematch.gendept.4 0.115658 0.279259 NA 0.679 nodefactor.gendept.2 -0.148699 0.232859 NA 0.523 nodefactor.gendept.3 -0.172505 0.191351 NA 0.367 nodefactor.gendept.4 0.004574 0.194113 NA 0.981 nodecov.prod 0.211575 0.021842 NA <1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

More Related