290 likes | 609 Views
Human Genetic Variation. Genetics of Complex Diseases. The Human Genome Project - Goals. Determine the sequences of the 3 billion base pairs that make up human DNA . The Human Genome Project - Goals. Determine the sequences of the 3 billion chemical base pairs that make up human DNA
E N D
Human Genetic Variation Genetics of Complex Diseases
The Human Genome Project - Goals • Determine the sequences of the 3 billion base pairs that make up human DNA
The Human Genome Project - Goals • Determinethe sequences of the 3 billion chemical base pairs that make up human DNA • Improve tools for data analysis
The Human Genome Project “What we are announcing today is that we have reached a milestone…that is, covering the genome in…a working draft of the human sequence.” “But our work previously has shown… that having one genetic code is important, but it's not all that useful.” “I would be willing to make a predication that within 10 years, we will have the potential of offering any of you the opportunity to find out what particular genetic conditions you may be at increased risk for…” Washington, DC June, 26, 2000
The Vision of Personalized Medicine Genetic and epigenetic variants + measurable environmental/behavioral factors would be used for a personalized treatment and diagnosis
Example: Warfarin An anticoagulant drug, useful in the prevention of thrombosis.
Example: Warfarin Warfarin was originallyused as rat poison. Optimal dose variesacross the population Genetic variants (VKORC1 and CYP2C9) affect the variation of the personalized optimal dose.
Association Studies Studying complex diseases by comparing cases to controls
Where should we look first? SNP= Single Nucleotide Polymorphism person 1: ….AAGCTAAATTTG…. person 2: ….AAGCTAAGTTTG…. person 3: ….AAGCTAAGTTTG…. person 4: ….AAGCTAAATTTG…. person 5: ….AAGCTAAGTTTG…. Most common SNPs have only two possible alleles.
Associated SNP Disease Association Studies Cases: AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATGAGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC Associated SNP Controls: AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAGCCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC
Genotyping technology AGACTAACC…. ACGAATCCT…. GGACTTACC…. GCACAACCT…. GGGATTAAC.… DNA
Genotype technologies • Cost of genotyping technologies has reduced dramatically in the last decade. • Genotyping one SNP per one individual was > $1 in the beginning of the decade. • Price now is at 0.03 cents. • Exponential growth – doubles every 10 months • Faster than Moore’s law – doubling every 18 months.
HapMap Phase 2 5,000,000+ SNPs 600,000,000+ genotypes TSC Data Nucleic Acids Research 35,000 SNPs 4,500,000 genotypes Perlegen Data Science 1,570,000 SNPs 100,000,000 genotype NCBI dbSNP Genome Research 3,000,000 SNPs 286,000,000 genotypes Daly et al. Nature Genetics 103 SNPs 40,000 genotypes Gabriel et al. Science 3000 SNPs 400,000 genotypes 2001 2002 2003 2004 2005 2007 Public Genotype Data Growth
Association Studies Genetic variants such as Single Nucleotide Polymorphisms (SNPs) are tested for association with the trait.
Published Genome-Wide Associations through 6/2009, 439 published GWA at p < 5 x 10-8 NHGRI GWA Catalog www.genome.gov/GWAStudies
Preliminary Definitions • SNP – single nucleotide polymorphism. A genetic variant which may carry different alleles for different individuals. • Most SNPs are bi-allelic. There are only two observed alleles in the populations. • Risk allele – the allele which is more common in cases than in controls (denoted R) • Nonrisk allele – the allele which is more common in the controls (denoted N)
Other Structural Variants Inversion Deletion Copy number variant
Chance or Real Association? Cases: AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGTC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACAGGTATAGCCTACATGAGATCAACATGAGATCGGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACATGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGTC AGAGCAGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC Associated SNP Controls: AGAGCAGTCGACATGTATAGTCTACATGAGATCGACATGAGATCGGTAGAGCAGTGAGATCAACATGATAGCC AGAGCAGTCGACATGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCAGTCGACATGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGTC AGAGCCGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCAACATGATAGCC AGAGCAGTCGACAGGTATAGTCTACATGAGATCGACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGCCTACATGAGATCGACATGAGATCTGTAGAGCCGTGAGATCGACATGATAGCC AGAGCCGTCGACAGGTATAGTCTACATGAGATCAACATGAGATCTGTAGAGCAGTGAGATCGACATGATAGTC
Hypothesis testing • We want to distinguish between two hypotheses: • Null hypothesis: the allele frequency in the cases and the controls is the same (the SNP has nothing to do with the disease) • Alternative hypothesis: the allele frequency in the cases and in the controls is different (the SNP is correlated with the disease). • Intuitively, we want to ask how likely is the null hypothesis.
How does it work? • For every SNP we can construct a contingency table: • From the table we construct a statistic . • The likelihood that under the null hypothesis we get T or a bigger number is a p-value.
Example: • For every SNP we can construct a contingency table: T = 0.02. The p-value is 0.8875 (88% chance of getting T > 0.02)
Example: • For every SNP we can construct a contingency table: T = 11.11 The p-value is low = 0.001 = 10-3
Example: • For every SNP we can construct a contingency table: T = 83.33 The p-value is extremely low = 10-19
Challenge 1: Corrections of multiple testing • In a typical Genome-Wide Association Study (GWAS), we test millions of SNPs. • If we set the p-value threshold for each test to be 0.05, by chance we will “find” about 5% of the SNPs to be associated with the disease. • This needs to be corrected. Different statistical methods are used.
Challenge 2: Correcting genotyping errors • How can we detect genotyping errors? • Hardy-Weinberg Equilibrium • If we have Mother-father-child trios we can check Mendelian consistency.