1 / 28

Hunting Disease Genes in the Wilds of the Genome -- II

Hunting Disease Genes in the Wilds of the Genome -- II. Richard A. Spritz, M.D. April 8, 2010 richard.spritz@ucdenver.edu 303-724-3107. HMGP. Why Find Disease Genes?. The Future? Personalized Medicine.

kyna
Download Presentation

Hunting Disease Genes in the Wilds of the Genome -- II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hunting Disease Genes in the Wilds of the Genome -- II Richard A. Spritz, M.D. April 8, 2010 richard.spritz@ucdenver.edu 303-724-3107 HMGP

  2. Why Find Disease Genes?

  3. The Future? Personalized Medicine • Optimized individualized treatments based on genetic diagnosis of disease susceptibilities • Preventative treatments tailored to one’s specific disease risks (“personalized medicine”)

  4. I. Hypothesis-driven approaches • Candidate gene association • Candidate gene sequencing II. Hypothesis-free approaches Genomewide association (Genomewide expression) Genomewide sequencing Exome Full-genome • Most hypotheses wrong! How Do You Find Disease Genes?

  5. Common, Complex Diseases • Asthma • Autism • Obesity • Preterm birth • Cleft lip/palate • IBD • Diabetes • Cancers • Common traits like height

  6. Common, Complex Diseases Utility of Experimental Approaches Common RISK ALLELE FREQUENCY Rare GWAS Re-Sequencing Linkage Small EFFECT SIZE (OR) Large

  7. Candidate genes Depends on: biological hypothesis (biological candidate) positional hypothesis / information (positional candidate) • Sometimes successful in Mendelian disorders • Low yield in polygenic, multifactorial (“complex”) disorders—pathogenic sequence variants not obvious, often present in normal individuals • Most hypotheses wrong! Hypothesis-Driven Approaches

  8. Concept: Causal disease variation in gene suggested by known biology ‘tagged’ by nearby polymorphic DNA markers; test for co-occurrence. Because: DNA sequence variations very close together on the same piece of DNA will tend to not be separated by recombination over long periods, and so will be non-randomly co-inherited (“linkage disequilibrium”). Therefore: Genotype known variants in a candidate gene as surrogates for unknown disease-causing variants; can’t discover ‘new’ genes; most hypotheses wrong! Candidate Gene Association Study

  9. Candidate Gene Association Studies • Typically compares SNP allele (or genotype) frequencies in cases versus controls (“case-control” study design) • Easy statistics (Fisher exact test, Chi-square) • Must Bonferroni correct for multiple-testing • Must ethnically match cases and controls • Easy, cheap • Most powerful for common risk alleles • Can detect common alleles with small allele-specific effects (i.e. “complex”, polygenic traits) • Most common published type of “genetic study” • Most hypotheses wrong!

  10. Two Fatal Flaws in Gene-by-Gene Case-Control Design • Must apply multiple-testing correction; true denominator often not known • Must ethnically match cases & controls; otherwise, differences in allele frequencies may reflect different genetic backgrounds of cases vs. controls, not disease association • Difficult or impossible even in “homogeneous” population, occult admixture (“stratification”), can lead to false-positives • Even true associations vary between populations • ~96% of published positive case-control associations are false-positives due to population stratification and publication bias

  11. “Population stratification” and false-positive case-control genetic association studies Population 1Population 2 blue/green just indicates overall genetic background Disease Admixed Study Population 1/2 Prof. Wizard’s Case-Control Study CasesControls Eureka!

  12. “Family-based” association studies: • Compare allele transmission from parents to patients • Much less prone to false-positives • Require nuclear families; difficult for adult disease (parents often not available/living)

  13. “Family-Based” Association Studies Avoids stratification; each family is its own control • “Transmission disequilibrium test” (TdT) compares transmission frequency of marker alleles from parents to affected offspring in “trios” to theoretical 50%

  14. Hypothesis-Free Approaches Genome-Wide Association Studies (GWAS) • Relatively recent approach (>300 published): • Genotype hundreds of thousands to millions of SNPs across genome using microarrays; extremely expensive • Case-control or family-based (trio) design • Requires no hypotheses about pathogenesis; can discover new genes • Can discover common alleles with small effects • Can provide very fine localization

  15. Genome-wide association studies (GWAS) • Can apply appropriate multiple testing correction • - “Genomewide significance” P < 5 x 10-8 • Still requires ethnic matching of cases and controls • - Can correct for population stratification • “Principal components” analysis • Genomic inflation factor, “genomic control” • Can discover new, unknown genes; power similar to candidate gene case-control study • Case-control “associations” require independent confirmation Hypothesis-free approaches

  16. The Genomewide Association Study (GWAS) Manolio TA. N Engl J Med 2010;363:166-176.

  17. Meta-Analysis of Genomewide Association Studies Manolio TA. N Engl J Med 2010;363:166-176.

  18. Genomewide Dataset “Quantile-Quantile (QQ) Plot” Genomic Inflation Factor 1.11Genomic Inflation Factor 1.00 Correct Test Statistics by “Genomic Control” method

  19. Genome-Wide Association Studies“Manhattan plot” Per-SNP -log(P values) across genome for association of SNP allele freq. differences between patients with generalized vitiligo versus controls (all Caucasian)

  20. Genome-Wide Association Studies • Very large number of SNPs tested (500,000 – 2,000,000) presents huge multiple-testing problem; requires at least ~1000 cases and ~1000 controls • Many SNPs in linkage disequilibrium (i.e. correlated); simple Bonferroni correction too strict (assumes independence) • Can minimize # SNPs genotyped by genotyping “tagSNPS” (SNPs that ‘tag’ specific haplotype blocks from HapMap) • “Significant” associations require confirmation by independent follow-up association study of specific SNPs to reduce multiple-testing complexity

  21. Personalized Medicine The case of the ‘missing heritability’ • Disease risk genes found by GWAS • account for only a small fraction of genetic risk • >Type 1 diabetes-- ~50 genes, ~6.5% of genetic risk • Are there a virtually unlimited number of additional genes, each conferring small additional risk? • >Maybe, but probably not • Have we under-estimated fraction of genetic risk already accounted for? • >Maybe. GWAS misses rare risk alleles • Have we over-estimated total genetic component of risk? • >Maybe, but not ten-fold

  22. Hypotheses of Common, “Complex” Disease • Common disease, common variant hypothesis (Reich & Lander, 2001) • versus • Rare variant hypothesis (Pritchard, 2001; Prixhard and Cox, 2002)

  23. Complex Diseases Utility of Experimental Approaches Common RISK ALLELE FREQUENCY Rare GWAS Re-Sequencing Linkage Small EFFECT SIZE (OR) Large

  24. Disease risk genes found by GWAS • account for only a small fraction of genetic risk • >Type 1 diabetes-- ~50 genes, ~6.5% of genetic risk • Implies that detailed prediction via personalized medicine may not be realistic • Are there a virtually unlimited number of additional genes, each conferring small additional risk? • >Maybe, but probably not • Have we under-estimated fraction of genetic risk already accounted for? • >Maybe. GWAS misses rare risk alleles • Have we over-estimated total genetic component of risk? • >Maybe, but not ten-fold • What does that mean for Personalized Medicine. Will it work? • >Maybe. Odds Ratio v. Population Attributable Risk Personalized Medicine The case of the ‘missing heritability’

  25. Deep re-sequencing Combined hypothesis-based and hypothesis-free approaches • High-throughput DNA sequencing • Biological candidate genes • GWAS signals (specific genes or genes within regions) • Must distinguish potentially causal variants from non-pathological variation (1000 Genomes Project data will help) • Prioritize for follow-up functional analyses

  26. Exome/Genome sequencing Hypothesis-free approach • High-throughput DNA sequencing • - Genome • - Exome (1% of genome) • Must distinguish potentially causal variants from non-pathological variation (1000 Genomes Project data will help) • Predict based on Mendelian inheritance • Compare across unrelated families • Prioritize for follow-up functional analyses

  27. Missense (non-synonymous) substitutions • Most rare (<1%) missense may be deleterious • > MAQ, Bowtie, SOAP2 • Nonsense, frameshift mutations • Splice junction mutations • Exonic splice enhancer mutations • > SKIPPY • INDELs, CNVs, translocations • > GSNAP • ENSEMBL Regulatory Feature variants Variant Prioritization in Exome/Genome Sequencing

  28. GENETICS

More Related