980 likes | 1.16k Views
Integrating Genetic and Biomarker Data with Social Science Research: Genetics. Jason Fletcher Assistant Professor Health Policy and Administration Yale University RWJ Health and Society Scholar Columbia University. Goals. Introduce some terminology Requires multiple exposures Focus
E N D
Integrating Genetic and Biomarker Data with Social Science Research: Genetics Jason Fletcher Assistant Professor Health Policy and Administration Yale University RWJ Health and Society ScholarColumbia University
Goals • Introduce some terminology • Requires multiple exposures • Focus • Limitations • What findings from genetics should you believe? • Opportunities • How might social scientists use genetic data? • Advances in both genetics and social science
Data Opportunities • Currently Available—DNA data • Add Health • National longitudinal sample, 15K, Age 12-30, siblings, school friends, focus on health • Fragile Families • National longitudinal sample, 5K, Mothers and children, lower income/immigrant samples • Wisconsin Longitudinal Study • 1957 HS grads and sibs, long follow up • Framingham Heart Study • Medical focus, multigenerational study • Many international datasets
Eventually available(?) • Health and Retirement Study • National longitudinal study, ages 50+, spouses, health and aging • Panel study of income dynamics • National longitudinal study, multigenerational families/all ages, income and labor market, health • National Longitudinal Survey of Youth • National longitudinal study, labor market focus, multigenerational, siblings
Outline • Background • Behavioral genetics (non-molecular) • Molecular genetics • Integration with Social Science • Gene X Environment interactions • Instrumental variables
Behavioral genetics • Family based/twin studies • No DNA data • Decomposition of variance of outcomes into three components • A=Heritability • C=Shared/Common environment • E=Unshared/Unique environment • Heritability estimates (h2) • Comparison of correlation of MZ twins with DZ twins
The basic BG model • Variation in phenotype (outcome/observable characteristic) is a function of variation in additive genetic (genotype) and environmental contributions (shared and unshared)
Classic twin design • A=genotype; C=common environment; E=unique environment • Identical/Monozygotic (MZ) twins share 100% of genetic makeup • Fraternal/Dizygotic (DZ) twins share ~50% of genetic makeup • Equal environments assumption
Credibility Test? Schonemann 1997
Example from literature: Boardman et al (2008) • Regression based approach • g is zygosity (0.5 for MZ, 0 for DZ), is coefficient of interest • Equal environments issue • Dressed the same, same room, same playmates
Findings • Positive affect is highly heritable (~.6) • Controls decrease heritability estimate for women to 0.3 but men to 0.52 • Socioeconomic Xs, emotional support, stress
Some key assumptions/issues • Random mating of parents generation • Assortative mating tends to deflate h2 • Equal environments assumption • Violations tend to inflate h2 • External validity to non-twins • Gene-environment correlation • Inflates h2 • Gene-environment interaction • Subsumed in h2
2nd Design: Adoptions • Correlation between two adopted siblings • C • Correlation between two non-adopted siblings • 1/2A+C • Assumes equal environments • What about gene-environment correlations? Selection of adoptees
Recent Innovation: Adoption Studies • Sacerdote (2007) • Quasi-Random assignment of adoptees • Gene-environment correlation • h2 • 41% for college graduation • 44% for education attainment • 33% for income • 5% for alcohol use • 27% for tobacco use • 17% for overweight status
Innovations: Twin Studies • Random mating • Mating parameter in robustness checks • Genotype siblings in order to estimate assortative mating parameter • Equal environments assumption • Use survey questions that measure shared environments • Richer family level data • Cousins, siblings, parents, etc.
A Puzzle: Heritability vs. Measured Genetic Variation • Large heritability estimates (~.3) • Small measured variation using genetic data
Additional new directions • Variation of h2 by study population • Gender, Race, Country, Time Period • Can this tell us anything about gene x environment interactions?
Quiz from Collegeboard.com • If a person has a disorder with h2=1, then the person will suffer from the disorder • False, phenylketonuria (PKU) = 1 but mental retardation can be prevented through diet • The heritability of having fingers on each hand is 1 or close to 1 • False, it is close to zero because the source is often environmental • Heritability and inherited are nearly the opposite in meaning • True; equalizing school environments will increase heritability of achievement • The heritability of behaviors of identical twins is 1 • False, it is zero • http://apcentral.collegeboard.com/apc/members/homepage/45829.html
Discussion/Questions • What do we learn from h2 estimates? • What are the policy implications of estimates? • Heritability estimates set no upper limit on the potential effect of reducing or eliminating variation in environmental factors that currently vary in response to genotype, as many do. Nor do they set an upper limit on the effect of creating new environments. • Heritability estimates do set an upper limit on the effect of reducing or eliminating environmental variations that are independent of genotype, but other statistics usually provide even better estimates of these effects. • There is no evidence that genetically based inequalities are harder to eliminate than other inequalities. • Until we know how genes affect specific forms of behavior, heritability estimates will tell us almost nothing of importance (Jenks).
Molecular Genetics • Describe a few concepts • How do scientists/biologists/geneticists use genetic data? • Sources: • http://www.psych.umn.edu/courses/fall09/mcguem/psy5137/lectures.htm
Properties of Genetic Material • Specify a code for protein synthesis (i.e., code for an the sequence of amino acids in a polypeptide chain.) • Duplicate or replicate during both mitosis and meiosis
Deoxyribonucleic Acid (DNA) • Double stranded • Strands are held together by (hydrogen) bonds that form between the nucleotide bases of the DNA molecule Adenine (A) <====> Thymine (T) Guanine (G) <====> Cytosine (C)
Length of Human Genome • ~ 3,000,000,000 bases of DNA • 1 kilo base (kb) = 1000 bases • 1 mega base (Mb) = 1,000,000 bases • 1 giga base (Gb) = 1,000,000,000 bases • Average protein has ~ 400 amino acids, requiring 1200 DNA bases or 1200bp
Translation The basic informational unit is 3 nucleotide bases (called a codon). Each codon specifies a single amino acid. There are 4*4*4=64 possible sequences but only 20 possible amino acids.
Gene A sequence of DNA (a locus on a chromosome) that is involved in (“codes for”) the synthesis of a functional polypeptide (proteins consist of one or more polypeptides). “Modern Definition” (circa 2006): A locatable region of genomic sequence, corresponding to a unit of inheritance, which is associated with regulatory regions, transcribed regions and/or other functional sequence regions
Non-coding DNA • ~98% of human DNA does not code directly for protein • Pseudogenes (evolutionary relics) • Repetitive DNA • Interspersed • Minisatellite repeats (10-30 bp) • Microsatellite repeats (< 10 bp) • Regulatory regions VNTR
Gene Structure • Typical gene is composed of multiple • exons – Expressed sequences of DNA that are translated into protein • introns - Intervening DNA sequences that are not translated
Genetic Variation Genetic variation between individuals refers to differences in the DNA sequence • Originally arose through (gametic) mutation. • An estimated 99.8% - 99.9% of our DNA is common • But then .1% of 3,000,000,000 = 3 million differences
The Genetic Basis for Human Variation Derived from dbSNP release 128 http://www.ncbi.nlm.nih.gov/SNP/
Types of Genetic Variation • Chromosomal/Structural: Variations (or rearrangements) in the amount of genetic material inherited • Polymorphisms: – Variations in the DNA sequence • SNPs (~10,000,000) • VNTR (STR, SSR)
Types of Genetic Variation: Variable Number of Tandem Repeats (VNTR) • Microsatellite: Small number of bases (<10) repeated a variable number of times (usually < 100)(>100,000)
Huntington’s disease is an example of a microsatellite triplet repeat in a coding region
How do researchers link genetic variation to outcomes? • Candidate gene examinations • Sometimes from animal models • Specifically examine a small number of polymorphisms and an outcome • Sometimes use family based designs • Replication • Gene association studies/Genome wide association studies (GWAS) • Gene-finding exercise (atheoretical)
Ex: Corder et al. Science 1993, p. 921-923 • Alzheimer and APOE
Ex 2: BRCA1 and Breast Cancer • Mutations thought to account for 45% of families with high breast cancer incidence