1.89k likes | 2.13k Views
Course handouts. Introduction to Multilevel Models: Getting started with your own data University of Bristol Monday 31 ST March– Friday 4th April 2008. Resources. Centre for Multilevel Modelling http://www.mlwin.com/.
E N D
Course handouts Introduction to Multilevel Models: Getting started with your own data University of Bristol Monday 31ST March– Friday 4th April 2008
Resources Centre for Multilevel Modelling http://www.mlwin.com/ Provides access to general information about multilevel modelling and MlwiN. Includes Multilevel newsletter (free electronic publication) http://www.mlwin.com/publref/newsletters.html Email discussion group: www.jiscmail.ac.uk/multilevel/ Lemma will include training repository http://www.ncrm.ac.uk/nodes/lemma/about.php
1.0 Introductions Participants introduce themselves : Who you are? Whare are you from?
2.00 Multilevel Data Structures In any complex structure we can identify atomic units. These are the units at the lowest level of the system. The response or y variable is measured on the atomic units. Often, but not always, these atomic units are individuals. Multilevel modelling is designed to explore and analyse data that come from populations which have a complex structure. Individuals are then grouped into higher level units, for example, schools. By convention we then say that students are at level 1 and schools are at level 2 in our structure.
2.01 Levels, classifications and units A level(eg pupils, schools, households, areas) is made up of a number of individuals units(eg particular pupils, schools etc). The term classification and level can be used somewhat interchangeably but the term level implies a nested hierarchical relationship of units (in which lower units nest in one, and one only, higher-level unit) whereas classification does not.
2.02 Two-level hierarchical structures Students within schools Unit diagram one node per unit Classification diagram one node per classification School Sc1 Sc2 Sc3 Sc4 School Student Students St1 St2 St3 St1 St2 St1 St2 St3 St1 St2 St3 St4 Students within a school are more alike than a random sample of students. This is the ‘clustering’ effect of schools.
2.03 Data frame for student within school example 1 Do Males make greater progress than Females? 2 *Does the gender gap vary across schools? 3* Are Males more or less variable in their progress than Females? 4 *What is the between-school variation in student’s progress? 5 *Is School X (that is a specific school) different from other schools in the sample in its effect? 6* Are schools more variable in their progress for students with low prior attainment? 7 Do students make more progress in private than public schools? 8* Are students in public schools less variable in their progress? * Requires multilevel model to answer
2.04 Variables, levels, fixed and random classifications Given that school type(state or private) classifies schools, we could redraw our classification diagram School type School as School Do we now have a 3-level multilevel model? Student Student We can divide classifications into two types : fixed classifications and random classifications. The distinction has important implications for how we handle the classifying variable in a statistical analysis. For a classification to be a level in a multilevel model it must be a random classification. It turns out that school type is not a random classification.
2.05 Random and Fixed Classifications A classification is a random classification if its units can be regarded as a random sample from a wider population of units. For example the students and schools in our example are a random sample from a wider population of students and schools. However, school type or indeed, student gender has a small fixed number of categories. There is no wider population of school types or genders to sample from. Traditional or single level statistical models have only one random classification which classifies the units on which measurements are made, typically people. Multilevel models have more than one random classification.
2.06 Other examples of two-level hierarchical structures Repeated measures, panel data Mutivariate response models
2.07 Repeated Measures data In the previous example we have measures on an individual at two occasions a current and a prior test score. We can analyse change (that is progress) by specifying current attainment as the response and prior attainment as a predictor variable. However, when there are measurements on more than two occasions there are advantages as treating occasion as a level nested within individuals. Such a two level strict hierarchical structure is known as a repeated measurement or panel design
Person Measurement Occasion 2.08 Classification, unit diagrams and data framesfor repeated measures structures. P1 P2 P3 ..... O1 O2 O3 O4 O1 O2 O1 O2 O3 Wide form 1 row per individual Long form 1 row per occasion(required by MLwiN)
2.09 Repeated Measures Cntd Atomic units are occasions not individuals. Modelling between individual variation in growth, growth curves. In a multilevel repeated measures model data need not be balanced or equally spaced. Explanatory variables can be time invariant (gender) or time varying (age)
2.10 Multivariate responses within individuals Sometimes we may wish to model not a single response (y-variable) we may have many. For example, we may wish to consider jointly English and Mathematics exam scores for students as two possibly related responses. We can regard this as a multilevel model with subjects (English and Maths) nested within students A multilevel multivariate response model can estimate the covariance (or correlation) matrix between responses and efficiently handle missing data. Student St1 St2 St3 St4… E M E E M M Subject
2.11 Data frames for multivariate response models Wide form 1 row per individual Long form 1 row per measurement(required by MLwiN)
School Sc1 Sc2 Sc3 Class C1 C2 C1 C2 StudentSt1 St2 St3 St1 St2 St1 St2 St3 St1 St2 St3 St4 School Class Student 2.12 Three level structures Students:classes:schools MLM allow a different number of students in each class and a different number of classes in each school. Bennett(1976) used a single level model to asses whether teaching styles affected test scores for reading and mathematics at age 11. The results prompted a call for return to traditional or formal teaching methods. This analysis did not take account of the dependency structures in the data: students in a class more similar than a random sample of students, likewise classes in a school. Subsequent ML analysis found the effects of traditional methods non-significant.
2.13 Data Frame for 3 level model, students: classes: schools
2.14 Other three level structures ·Repeated measures within students within schools. This allows us to look how learning trajectories vary across students and schools. Multivariate responses on four health behaviours (drinking, smoking exercise & diet) on individuals within communities, such a design will allow the assessment of the how correlated are the behaviors at the individual level and the community level and to do so taking account of other characteristics at both the individual and community level. We can also can assess the extent to which there are unhealthy communities as well as unhealthy individuals A repeated cross-sectional design with students:cohorts:schools
Sc1 Sc2 Sc3.... 1990 1991 1990 1991 1990 1991 St1 St2.... St1 St2..... St1 St2... St1 St2... St1 St2..... St1 St2... School Cohort Student 2.15 Repeated cross-sectional design Above are unit and classification diagrams where we have Exam scores for groups of students who entered school in 1990 and a further group who entered in 1991. The model can be extended to handle an arbitrary number of cohorts. In a multilevel sense we do not have 2 cohort units but 2S cohort units where S is the number of schools.
2.16 Four level hierarchical structures By now you should be getting a feel about how basic random classifications such as people, time, multivariate responses, institutions, families and areas can be combined within a multilevel framework to model a wide variety of nested population structures. Here areas some examples of 4-level nested structures. • student within class within school within LEA • multivariate responses within repeated measures within students within schools • repeated measures within patients within doctor within hospital • people within households within postcode sectors within regions As a final example of a strict hierarchy we will consider a doubly nested repeated measures structure.
Sc1 Sc2... School Cohort 1990 1991 1990 1991 student Msmnt occ 2.17 repeated measures within students within cohorts within schools St1 St2... St1 St2.. St1 St2.. St1 St2.. O1 O2 O1 O2 O1 O2 O1 O2 O1 O2 O1 O2 O1 O2 O1 O2 Cohorts are now repeated measures on schools and tell us about stability of school effects over time Measurement occasions are repeated measures on students and can tell us about students’ learning trajectories.
2.18 Non-hierarchical structures So far all our examples have been exact nesting with lower level units nested in one and only one higher-level unit. That is we have been dealing with strict hierarchies. But social reality can be more complicated than that. In fact we have found that we need two non-hierarchical structures which in combination with strict hierarchies have been able to deal with all the different types of designs, realities and research questions that we have met • Cross-classified structures • Multiple membership structures
School S1 S2 S3 S4 Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Area A1 A2 A3 School S1 S2 S3 S4 school area School S1 S2 S3 S4 Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 pupil Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Area A1 A2 A3 Area A1 A2 A3 2.19 Cross-classified Model In this structure schools are not nested within areas. For example Pupils 2 and 3 attend school 1 but come from different areas Pupils 6 and 10 come from the same area but attend different schools Schools are not nested within areas and areas are not nested within schools. School and area are are cross-classified
area 1 area 1 area 2 area 2 area 3 area 3 School 1 School 1 P1,P2,P3 P1,P3 P2 School 2 School 2 P5 P4,P5 P4 School 3 School 3 P6,P7 P6,P7,P8 P8 A1 A2 A3 School 4 School 4 P10 P9,P11,P12 P9,P10,P11,P12 S1 S2 S3 S4 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 2.20 Tabulation of students by school and area to reveal across-classified structure Area A1 A 2 A3 School S1 S2 S3 S4 Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 All elements in a row lie in a single column Elements in a row span multiple columns, Elements in a column span multiple rows
2.21 Data frame for pupils in a cross-classification of schools and areas
examiner 1 examiner 2 examiner 3 student 1 m1 m2 student 2 m3 m4 Student 3 m5 m6 Student 4 m7 m8 2.22 Other examples of cross-classified structures • Exam marks within a cross classification of student and examiner, where a student’s paper is marked by more than one examiner to get an indication of examiner reliability. Note in this case we have at most 1 level one unit(mark) per cell in the cross-classification. Students within a cross-classification of primary school by secondary school. We may have students’ exam scores at age 16 and wish to assess the relative effects of primary and secondary schools on attainment at age 16 • Patients within a cross-classification of GP practice and hospital.
P8 School S1 S2 S3 S4 Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P1 Teacher Pupil 2.24 Multiple membership models Where atomic units are seen as nested within more than one unit from a higher level classification :. Health outcomes where patients are treated by a number of nurses, patients are multiple members of nurses Students move schools, so some pupils are multiple members of schools.
P1 P8 P7 School S1 S2 S3 S4 Pupil 8 has moved schools but sill lives in the same area Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Student 7 has moved areas but still attends the same school Area A1 A2 A3 2.23 Combining structures: crossed-classifications and multiple membership relationships Lets take the cross-classified model of the previous slide but suppose Pupil 1 moves in the course of the study from residential area 1 to 2 and from school 1 to 2 Now in addition to schools being crossed with residential areas pupils are multiple members of both areas and schools.
P1 P8 P7 School S1 S2 S3 S4 Pupils P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 Area A1 A2 A3 School Area Student 2.24 Classification diagram for multiple membership model Students nested within a cross-classification of school by area Students multiple members of schools Students multiple members of areas
H1 H2 Hospital N1 N2 N3 N4 Nurse GP practice Patient P1 P2 P3 P4 P5 P6 GP1 GP2 GP3 2.25 Combining structures : crossed, nested and multiple membership relationships Patients can be treated by more than one nurse during their stays in hospital, patients are multiple members of nurses Nurses work in only one hospital therefore nurses are nested within hospitals Patients nested within referring GPs. GP’s crossed with nurses. GP’s crossed with Hospitals.
School type state private School S1 S3 S2 S4 Pupils P1 P2 P3 P6 P7 P8 P4 P5 P9 P10 P11 P12 2.26 Distinguishing Variables and Levels NO! School type is not a random classification it is a fixed classification, and therefore a variable not as a level. Random classification if units can be regarded as a random sample from a wider population of units. Eg pupils and schools Fixed classsification is a small fixed number of categories. Eg State and Private are not two types sampled from a large number of types, on the basis of these two we cannot generalise to a wider population of types of schools, Similarly gender…..
3.0 Work with partner discussing what type of Multilevel data Structure corresponds to participant’s data(20 mins) Draw free-hand a classification diagram giving labels for units at each level and linking the nodes by appropriate arrows to reflect nested, crossed or MM relationships Complete a schematic data frame for your data set. Either use overheads provided or whatever software you find convenient.
4.0 Discussion of Exercise 3.0 Each participant takes 2 minutes to present the multilevel structure for their research problem
5: Modelling varying relations: from graphs to equations “There are NO general laws in social science that are constant over time and independent of the context in which they are embedded” Rein (quoted in King, 1976)
Rooms 1 2 3 4 5 6 7 8 -4 -3 -2 -1 0 1 2 3 5. 1 Varying relations plot • Simple set up - Two level model - houses at level 1 nested within districts at level 2 • Single continuous response: price of a house • Single continuous predictor: size = number of rooms and this variable has been centred around average size of 5
5. 3 General Structure for Statistical models • Response = general trend + fluctuations • Response = systematic component + stochastic element • Response = fixed + random • Specific case: the single level simple regression model Response Systematic Part Random Part Price of Cost house House = average- + of + residual Price sized extra variation house room Intercept Slope Residual
Rooms 1 2 3 4 5 6 7 8 -4 -3 -2 -1 0 1 2 3 5 4 Simple regression model is the outcome, price of a house is the predictor, number of rooms, which we shall deviate around its mean, 5
5.5 Simple regression model (cont) is the price of house i is the individual predictor variable is the intercept; is the fixed slope term: is the residual/random term, one for every house Summarizing the random term: ASSUME IID Mean of the random term is zero Constant variability (Homoscedasticy) No patterning of the residuals (i.e, they are independent) between house variance; conditional on size
5.6 Random intercepts model Premium Citywide line Discount Differential shift for each district j : index the intercept Micro-model Macro-model: index parameter as a response Price of average = citywide + differential for district j price district j Substitute macro into micro…….
5.7 Random intercepts COMBINED model Substituting the macro model into the micro model yields Grouping the random parameters in brackets • Fixed part • Random part (Level 2) • Random part (Level 1) • District and house differentials are independent
5.8 The meaning of the random terms • Level 2 : between districts • Between district variance conditional on size • Level 1 : within districts between houses • Within district, between-house variation variance conditional on size
5.9 Variants on the same model • Combined model • Combined model in full • Is the constant ; a set of 1’s Differentials at each level • In MLwiN
5. 11 Random intercepts and slopes model Micro-model Note: Index the intercept and the slope associated with a constant, and number of rooms, respectively Macro-model (Random Intercepts) Macro-model (Random Slopes) Slope for district j = citywide slope + differential slope for district j Substitute macro models into micro model…………
5.12 Random slopes model Substituting the macro model into the micro model yields Multiplying the parameters with the associated variable and grouping them into fixed and random parameters yields the combined model:
5.13 Characteristics of random intercepts & slopes model Fixed part Random part (Level 2) Random part (Level 1)
Intercepts: terms associated with Constant Slopes terms associated with Predictor Intercept/Slope terms associated with Graph Mean Variance Mean Variance Covariance A + 0 + 0 undefined B + + + 0 undefined C + + + + + D + + + + - E + + + + 0 5. 14 Interpreting varying relationship plot through mean and variance-covariances
attain pre-test attain pre-test attain pre-test attain pre-test attain pre-test