590 likes | 666 Views
E N D
ATGGATTCTGGTATGTTCTAGCGCTTGCACCATCCCATTTAACTGTAAGAAGAATTGCAC GGTCCCAATTGCTCGAGAGATTTCTCTTTTACCTTTTTTTACTATTTTTCACTCTCCCAT AACCTCCTATATTGACTGATCTGTAATAACCACGATATTATTGGAATAAATAGGGGCTTG AAATTTGGAAAAAAAAAAAAACTGAAATATTTTCGTGATAAGTGATAGTGATATTCTTCT TTTATTTGCTACTGTTACTAAGTCTCATGTACTAACATCGATTGCTTCATTCTTTTTGTT GCTATATTATATGTTTAGAGGTTGCTGCTTTGGTTATTGATAACGGTTCTGGTATGTGTA AAGCCGGTTTTGCCGGTGACGACGCTCCTCGTGCTGTCTTCCCATCTATCGTCGGTAGACAAGACACCAAGGTATCATGGTCGGTATGGGTCAAAAAGACTCCTACGTTGGTGATGAA CTCAATCCAAGAGAGGTATCTTGACTTTACGTTACCCAATTGAACACGGTATTGTCACCA ACTGGGACGATATGGAAAAGATCTGGCATCATACCTTCTACAACGAATTGAGAGTTGCCC CAGAAGAACACCCTGTTCTTTTGACTGAAGCTCCAATGAACCCTAAATCAAACAGAGAAA AGATGACTCAAATTATGTTTGAAACTTTCAACGTTCCAGCCTTCTACGTTTCCATCCAAG CCGTTTTGTCCTTGTACTCTTCCGGTAGAACTACTGGTATTGTTTTGGATTCCGGTGATG GTGTTACTCACGTCGTTCCAATTTACGCTGGTTTCTCTCTACCTCACGCCATTTTGAGAA TCGATTTGGCCGGTAGAGATTTGACTGACTACTTGATGAAGATCTTGAGTGAACGTGGTT ACTCTTTCTCCACCACTGCTGAAAGAGAAATTGTCCGTGACATCAAGGAAAAACTATGTT ACGTCGCCTTGGACTTCGAACAAGAAATGCAAACCGCTGCTCAATCTTCTTCAATTGAAA AATCCTACGAACTTCCAGATGGTCAAGTCATCACTATTGGTAACGAAAGATTCAGAGCCC CAGAAGCTTTGTTCCATCCTTCTGTTTTGGGTTTGGAATCTGCCGGTATTGACCAAACTA CTTACAACTCCATCATGAAGTGTGATGTCGATGTCCGTAAGGAATTATACGGTAACATCG TTATGTCCGGTGGTACCACCATGTTCCCAGGTATTGCCGAAAGAATGCAAAAGGAAATCA CCGCTTTGGCTCCATCTTCCATGAAGGTCAAGATCATTGCTCCTCCAGAAAGAAAGTACT CCGTCTGGATTGGTGGTTCTATCTTGGCTTCTTTGACTACCTTCCAACAAATGTGGATCT CAAAACAAGAATACGACGAAAGTGGTCCATCTATCGTTCACCACAAGTGTTTCTAA Genome-wide association studies: what they can and can't tell us about disease biology Hunter Fraser
Causes of disease • Environmental • Genetic
Causes of disease • Environmental • Correlation vs. causation • Genetic
Causes of disease • Environmental • Correlation vs. causation • Genetic
Causes of disease • Environmental • Correlation vs. causation • Genetic
Causes of disease • Environmental • Correlation vs. causation • Infinite possibilities • Genetic
Polymorphisms • SNPs • Insertions/deletions • Copy-number variation • Inversions • Translocations
Causes of disease • Environmental • Correlation vs. causation • Infinite possibilities • Genetic • Causality clear (in properly designed study)
Causes of disease • Environmental • Correlation vs. causation • Infinite possibilities • Genetic • Causality clear (in properly designed study) • Finite number of polymorphisms (~107 common)
Genome-wide association studies (GWAS) • Polymorphisms are the basis for the genetic component of disease risk variability • Genetic component (heritability) often explains >50% of disease incidence • Goal: For every polymorphism, determine what diseases are affected
Genome-wide association studies (GWAS) Diseases . . . Polymorphisms
Genome-wide association studies (GWAS) Diseases . . . Polymorphisms
Genome-wide association studies (GWAS) Diseases . . . Polymorphisms
Genome-wide association studies (GWAS) Diseases . . . Polymorphisms
Genome-wide association studies (GWAS) • Main idea: look for genetic differences of people with vs. without a disease • First, need a method able to genotype thousands/millions of polymorphisms at once– microarrays • Second, need to know where are the common polymorphisms– HapMap project • Third, need HUGE cohorts of people to find subtle allele frequency differences
Genome-wide association studies (GWAS) Genomic position
Genome-wide association studies (GWAS) • End result of a successful GWAS:
Genome-wide association studies (GWAS) • End result of a successful GWAS: • What does this actually tell us?
Genome-wide association studies (GWAS) • End result of a successful GWAS: • What does this actually tell us? • How to predict disease risk from genotype? • What polymorphisms cause disease? • What genes are involved?
What do GWAS tell us? • How to predict disease risk from genotype? • Potentially yes, but in practice GWAS explain only a few percent of the genetic component (“Missing heritability”) • What polymorphisms cause disease? • No, GWAS use “tag SNPs” in linkage disequilibrium with causal polymorphisms
What do GWAS tell us? • What genes are involved? • An important issue for our understanding of disease biology • Candidate genes are nearly always guessed, but this is biased by prior knowledge • Almost never any evidence implicating a particular gene, and most hits are intergenic • Transcriptional enhancers can act at long distance, making this a nontrivial problem
Inferring disease genes • Need an unbiased, systematic method to infer disease genes from GWAS hits • One solution: integrate results with separate GWAS for gene expression “traits” (eQTL) DNA Genotyping array RNA Expression array
eQTL mapping • How does this tell us about disease genes? • Coincidence, or SNP X affects disease Z via its effect on gene Y; Y is a “disease gene” SNP X expr of gene Y disease Z
Affymetrix exon arrays • ~6 million probes • Covers nearly all exons in human genome
The data set • Exon arrays on lymphoblastoid cell lines • 89 YRI (Yoruban from Nigeria) and 87 CEPH (European-American from Utah), genotyped at >3 million SNPs (HapMap)
The analysis • Compute transcript variation for each exon • Compare transcript variation patterns to SNP genotypes • Integrate with disease GWAS results
The analysis • Compute transcript variation for each exon • Compare transcript variation patterns to SNP genotypes • Integrate with disease GWAS results
The analysis • Compute transcript variation for each exon • Compare transcript variation patterns to SNP genotypes • Integrate with disease GWAS results
Comparing transcripts to SNPs • Compare each exon to all HapMap SNPs • Strongest correlations are local– SNPs nearby the exon(s) they affect
Comparing transcripts to SNPs • Calculate correlation between each exon’s expression level and all SNPs within 100kb
Comparing transcripts to SNPs • Calculate correlation between each exon’s expression level and all SNPs within 100kb
Comparing transcripts to SNPs • Calculate correlation between each exon’s expression level and all SNPs within 100kb SNPs 100 kb 100 kb
Significant associations: 1,061 exons, ~10 SNPs/exon
Top hit: IRF5 • A transcription factor that acts downstream of Toll-like receptors, and a cause of lupus
Top hit: IRF5 • A transcription factor that acts downstream of Toll-like receptors, and a cause of lupus probeset
Top hit: IRF5 • A transcription factor that acts downstream of Toll-like receptors, and a cause of lupus probeset r = 0.97 No overlap between genotypes
Second hit: OAS1 chr12 • 2’,5’-oligoadenylate synthetase 1 • Splice site mutations contribute to viral infection susceptibility and T1D (Bonnevie-Nielsen et al., 2005) • Three known splice variants of exons 5/6
Second hit: OAS1 chr12 probeset
Second hit: OAS1 chr12 probeset r = 0.93
Second hit: OAS1 chr12 probeset probeset r = 0.93
The analysis • Compute transcript variation for each exon • Compare transcript variation patterns to SNP genotypes • Integrate with disease GWAS results
Genome-wide association studies • Could some disease-associated SNPs influence disease through splicing? • Compiled list of 68 disease SNPs
Genome-wide association studies • Could some disease-associated SNPs influence disease through splicing? • Compiled list of 68 disease SNPs 4 overlaps with splicing SNPs (expect 0.1)
Genome-wide association studies • Could some disease-associated SNPs influence disease through splicing? • Compiled list of 68 disease SNPs 4 overlaps with splicing SNPs (expect 0.1) • All 4 are from autoimmune diseases! 24 autoimmune SNPs, expect 0.04 overlaps • Suggests tissue specificity of polymorphic transcript variation
Autoimmune-associated SNP • One SNP was associated with multiple autoimmune conditions (T1D, CD) • Located near PTPN2, a tyrosine phosphatase involved in immune regulation • PTPN2 known to have two splice forms, only one of which has an NLS (Ibarra-Sanchez et al., 2000) • Two isoforms have very different target proteins
Autoimmune-associated SNP • Could this SNP cause autoimmune diseases by changing the ratio of PTPN2 splice forms? probeset PTPN2