190 likes | 306 Views
Genetic Markers as Instrumental Variables. Stephanie von Hinke Kessler Scholder George Davey Smith Debbie A. Lawlor Carol Propper Frank Windmeijer. Royal Statistical Society 4 April 2011. Introduction. Application: Child adiposity (fat mass) and academic performance
E N D
Genetic Markers as Instrumental Variables Stephanie von Hinke Kessler Scholder George Davey Smith Debbie A. Lawlor Carol Propper Frank Windmeijer Royal Statistical Society 4 April 2011
Introduction • Application: Child adiposity (fat mass) and academic performance • Causal effect from adiposity to educational outcomes, e.g. • Overweight children experience higher absenteeism in school • Overweight children are more likely to have sleep problems • Overweight children may be treated differently by peers and teachers • Reverse causation, e.g. • Poor school outcomes may cause obesity • Association driven by other unobserved factors that affect both adiposity and academic outcomes, e.g. • Time discount rates
Introduction (cont.) • We use ‘Mendelian randomization’ to make causal inferences about the effect of child adiposity on academic outcomes • The random allocation of an individual’s genotype at conception • Although allocation is random at the family trio level, at a population level it has been shown that genetic markers are largely unrelated to other background characteristics
Mechanisms e.g. family wealth, SES ei Gi Ai Si
Genetics • Single Nucleotide Polymorphism (SNP): change in one particular location on the DNA sequence • Humans have two variations at each location on the DNA sequence. These are called alleles • Individuals can be: • Homozygous for the common allele • Heterozygous • Homozygous for the rare allele
Methodology • We specify an education production function; a simple OLS model where: is child educational performance is child adiposity (fat mass) A vector of child & family background characteristics Indicators for parental health and behaviour is the error term • Instrumental Variables Assumption 1: Assumption 2:
Methodology (cont.) 1. Suitable and robust genetic instrument • Consistent and robust associations should have been shown in a large number of independent studies • Many genetic associations found in specific samples fail to replicate in larger independent samples (Colhoun et al., 2003) • Prior knowledge on association between genotype-phenotype • Even if suitable and robust genetic instrument is available, it may explain little of the variation in observed phenotype; if alleles shift adiposity by a very small amount, this shift will identify the effect on educational attainment
Methodology (cont.) Our instruments: FTO and MC4R Frayling et al. (2007) use 38,759 individuals aged 7-80 from 13 different cohorts They find a positive association between FTO and all proxies for adiposity: • for individuals in all cohorts • in all countries • of all ages and • of both sexes, with no difference between males and females No association with birth weight or height Each copy of FTO risk allele increases weight for 11-year-olds by 0.8 - 1kg Similar, though slightly smaller associations are found for MC4R using 77,228 adults and 5,988 children (Loos et al., 2008)
Methodology (cont.) • Mechanisms • SNPs associated with increased consumption of fat and energy • Population Stratification • Ethnicity • Linkage Disequilibrium • Degree of linkage is function of distance between the loci • Pleiotropy • FTO and type II diabetes
Data • ALSPAC
Data (cont.) Outcome: nationally set KS3 exam (age 14, standardised) • National Pupil Database, a census of all English state school pupils Child adiposity: • Direct measure of child fat mass (DXA scan, age 11, standardised) Contextual variables: • Birth weight, breastfed, age (in months), household composition • Family income (sq), social class, employment status, mother’s and grandparents education, lone parenthood, local area deprivation (IMD) Maternal health and behaviour: • Smoking/drinking during pregnancy, mother’s age at birth • Mother’s locus-of-control, EPDS, CCEI • Parental investment in child: teaching scores, activity scores
Descriptive Statistics Notes: * p<0.10; ** p<0.05; ***p<0.01, standard deviations in parentheses.
Descriptive Statistics (cont.) Notes: * p<0.10; ** p<0.05; ***p<0.01.
Descriptive Statistics (cont.) Notes: * p<0.10; ** p<0.05; ***p<0.01.
Descriptive Statistics (cont.) Non-parametric regression of KS3 on fat mass
OLS Results Notes: * p<0.10; ** p<0.05; ***p<0.01, robust standard errors in parentheses.
First stage IV Results Notes: * p<0.10; ** p<0.05; ***p<0.01, robust standard errors in parentheses.
Second stage IV Results Notes: * p<0.10; ** p<0.05; ***p<0.01, robust standard errors in parentheses.
Conclusion We discuss the conditions that need to be met for genetic markers to be used as instrumental variables We relate the epidemiology literature to that in economics Exploit the richness of the ALSPAC data Our application example examines the effect of child fat mass on their educational outcomes • OLS: Heavier children perform worse in school tests • IV: No evidence of fat mass affecting outcomes Mendelian randomization – some comments • Strength of our instruments • Even with strictly exogenous instruments, need to recognise limitations • Power - Large standard errors • We need larger sample sizes, more variants, or markers with larger effects