1 / 44

Inference of cis and trans regulatory variation in the human genome

Inference of cis and trans regulatory variation in the human genome. Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK md4@sanger.ac.uk. Gene expression. Altered patterns of gene expression  disease .

mariko-ryan
Download Presentation

Inference of cis and trans regulatory variation in the human genome

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference of cis and trans regulatory variation in the human genome Manolis Dermitzakis The Wellcome Trust Sanger Institute Wellcome Trust Genome Campus Cambridge, UK md4@sanger.ac.uk

  2. Gene expression • Altered patterns of gene expression  disease. • e.g., Type 1 diabetes, Burkitt’s lymphomas. • Widespread intraspecific variation. • Heritable genetic variation for transcript levels. • Familial aggregation of expression profiles (Cheung et al. 2003). • In humans, ~30% of surveyed loci exhibited a genetic component for expression differences (Monks et al. 2004; Schadt et al. 2003). • Much of the influential variation is located cis- to the coding locus. • In humans, mouse, and maize, 35%-50% of the genetic basis for intraspecific differences in transcription level are cis- to the coding locus (e.g. Morley et al. 2004; Schadt et al. 2003; Stranger et al. 2005; Cheung et al. 2005, etc.).

  3. GENE REG DNA i) Pre-mRNA ii) mRNA iii) Protein GENE REG iv) DNA Nature of regulatory variation Expression Stranger and Dermitzakis, Human Genomics 2005

  4. Effects of Copy Number Variation on gene expression

  5. Gene expression association mapping AA AG GG Stranger et al. PLoS Genet 2005

  6. Phenotypic variation space

  7. illumina Human 6 x 2 gene GEX arrays

  8. Beads in Wells

  9. Cell line RNA IVT1 IVT2 rep2 rep3 rep1 rep4 Whole-genome gene expression • ~48,000 transcripts • 24,000 RefSeq • 24,000 other transcripts • 270 HapMap individuals: • CEU: 30 trios, 90 total • CHB: 45 unrelated • JPT: 45 unrelated • YRI: 30 trios, 90 total • 2 IVTs each person • 2 replicate hybridizations each IVT • Quantile normalization of all replicates of each individual. • Median normalization across all individuals of a population.

  10. Within- and between- individual variation 2 replicates; single YRI individual r2 (all genes) = 0.990 Detected genes (0.98 in both samples: 12,076) r2 (detected) = 0.994 2 YRI individuals r2 (all genes) = 0.964 Detected genes (0.98 in both samples: 11,529) r2 (detected) = 0.964

  11. 14,925 genes Phase I HapMap; MAF > 0.05 CEU: 762,447 SNPs CHB: 695,601 JPT: 689,295 YRI: 799,242 ~1/5kb HapMap SNPs 60 CEU 45 CHB 44 JPT 60 YRI

  12. Copy Number Variation dataset • Genome Structural Variation Consortium • Redon et al. Nature in press • Array-CGH using a whole genome tile path array • Median clone size ~170 kb • All 270 HapMap individuals • Quantitative values (log2 ratios) representing diploid genome copy number, not genotypes. • 1117 CNVs called from log2 ratios • Calls based on standard deviation of log2 ratios • Many CNVs experimentally verified 26,563 clones 93.7% euchromatic genome

  13. SNP cis-analysis: SNPs within 1Mb of probe midpoint 1Mb 1Mb 1Mb window probe gene SNPs

  14. Association analysis Additive association model: Linear regression e.g. CC = 0, CT = 1, TT = 2. - slope of line - p-value - r2 0 1 2

  15. CNV cis-analysis: clone midpoint within 2Mb of probe midpoint 2Mb 2Mb 1Mb window probe gene clones

  16. Linear regression for CNV and expression Clone signal (log2 ratio)

  17. Multiple-test correction whole-genome 1. Bonferroni cis- whole-genome 2. False Discovery Rate FDR cis- whole-genome permutations 3. cis-

  18. Permutation design GENOTYPES GENE EXPRESSION g11 g12 g13 g14 … g1n g21 g22 g23 g24 … g2n g31 g32 g33 g34 … g3n … … … gi1 gi2 gi3 gi4 … gin Exp1 Exp2 Exp3 … … … Expi permute - 10,000 permutations – each time keep lowest p-value - Null distribution of 10,000 extreme p-values - Compare observed p-values to the tails of the null

  19. Significant expression – cis-SNP associations • CEU genes 323 • CHB genes 348 • JPT genes 370 • YRI genes 411 • 888 non-redundant genes • 67 genes in all 4 populations (8%) • 333 genes in at least 2 populations (37%) ~ 6% genes exhibit significant cis- association permutation threshold 0.001; SNP-probe distance < 1Mb

  20. Significant expression – cis-CNV clones associations • CEU genes 40 • CHB genes 32 • JPT genes 40 • YRI genes 42 • 99 non-redundant genes • 7 genes associated in all 4 populations (7%) • 34 genes in at least 2 populations (34%) permutation threshold 0.001; clone-probe distance < 2Mb

  21. Some genes ABC1, ABHD6, ACY1L2, ADAT1, ARNT, ARSA, ASAHL, ATP13A, B7, BBS2, BLK, C14orf130, C14orf4, C14orf52, C1orf16, C20orf22, C21orf107, C7orf13, C7orf29, C7orf31, C8orf13, C9orf95, CARD8, CAT, CD151, CD79B, CDKN1A, CDKN2B, CGI-111, CGI-62, CGI-96, CHCHD2, CHI3L2, CHRNE, CNN2, CP110, CPEB4, CPNE1, CRIPT, CSTB, CTNS, CTSH, CTSK, DCLRE1B, DCTD, DERP6, dJ383J4.3, DKFZp434N035, DKFZP566H073, DKFZP566J2046, DKFZP586D0919, DKFZp761A132, DNAJD1, DOM3Z, DPYSL4, DSCR5, DTNB, ECHDC3, EGFL5, EIF2B2, ENTPD1, ERMAP, FCGR2A, FDX1, FKBP1A, FLJ10252, FLJ10904, FLJ12994, FLJ12998, FLJ13576, FLJ14009, FLJ14753, FLJ20444, FLJ20635, FLJ21347, FLJ21616, FLJ22374, FLJ22573, FLJ22635, FLJ23235, FLJ34443, FLJ35827, FLJ36888, FLJ37970, FLJ40432, FLJ46603, FLJ90036, FUT10, GAA, GSTM1, GSTM2, GSTT1, H17, HABP4, HIBCH, HLA-C, HLA-DQA1, HLA-DQA2, hmm1412, hmm23621, hmm26268, hmm31752, hmm31999, hmm3577, hmm3587, hmm5445, hmm665, hmm8232, HNLF, Hs.119946, Hs.124623, Hs.135624, Hs.153573, Hs.158943, Hs.164463, Hs.169006, Hs.171169, Hs.212658, Hs.245997, Hs.26039, Hs.264076, Hs.311977, Hs.333841, Hs.379903, Hs.396207, Hs.400876, Hs.40696, Hs.431200, Hs.43687, Hs.453941, Hs.460359, Hs.465789, Hs.466924, Hs.467281, Hs.482037, Hs.485895, Hs.490095, Hs.495422, Hs.506072, Hs.517172, Hs.519979, Hs.5855, Hs.6637, HSRTSBETA, IFIT5, IL16, IL21R, IMAGE3451454, IMMT, IPP, IREB2, IRF5, KIAA0265, KIAA0483, KIAA0643, KIAA0748, KIAA1463, KIAA1627, LCMT1, LOC113386, LOC132001, LOC132321, LOC135043, LOC151963, LOC282956, LOC283710, LOC283970, LOC284184, LOC284293, LOC285407, LOC286353, LOC339231, LOC339803, LOC339804, LOC340435, LOC347981, LOC348094, LOC348180, LOC374758, LOC375097, LOC375399, LOC378075, LOC388918, LOC389362, LOC389763, LOC399987, LOC400410, LOC400566, LOC400642, LOC400684, LOC400933, LOC401075, LOC401135, LOC401284, LOC51240, LOC90637, LOC90693, MAN1A2, MCMDC1, MGC10120, MGC12458, MGC13186, MGC19764, MGC20235, MGC20481, MGC20781, MGC22773, MGC24665, MGC2752, MGC3794, MGC9084, MMRP19, MRPL21, MRPL43, MTERF, MYOM2, NDUFA10, NDUFS5, NMNAT3, NUDT2, OAS1, PACSIN2, PASK, PBX4, PCTAIRE2BP, PEX5, PEX6, PGS1, PHACS, PHC2, PHEMX, PIP5K1C, PIP5K2A, PKHD1L1, POLR2J, PP3856, PP784, PPA2, PPFIA1, PPIL3, PTER, QRSL1, R29124_1, RABEP1, RAPGEFL1, RDH5, RPAP1, RPL13, RPL36AL, RPL8, RPLP2, RPS16, RPS6KB2, SARS2, SERPINB10, SF1, SH3GLB2, SHMT1, SIAT4C, SIVA, SKIV2L, SNAP29, SNX11, SOD2, SPG7, SQSTM1, ST7L, STAT6, STK25, SYNGR1, SYNGR3, TAP2, TAPBP-R, TBC1D4, TCL6, TEF, TGM5, THAP5, THAP6, THOC3, TIMM10, TINP1, TMEM8, TMPIT, TRAPPC4, TRIM4, TSGA10, TSGA2, TUBB, UBE2G1, UGT2B11, UGT2B17, UGT2B7, UROS, USMG5, VPS28, WARS2, WBSCR27, WWOX, XRRA1, ZNF266, ZNF384, ZNF493, ZNF587, ZNF79, ZNF85, ZRANB1, • UGT2B7, 11, 17 • GSTM1

  22. Genomic location of associations SNP CNV

  23. SNPs CNVs

  24. Effects of Copy Number Variation on gene expression POSITIVE POSITIVE OR NEGATIVE NEGATIVE POSITIVE OR NEGATIVE

  25. Negative or positive slope in CNV associations 80% positive 20% negative

  26. What is the overlap between SNP and CNV effects? Do SNPs capture the CNV effects through Linkage Disequilibrium?

  27. LD between CNV and SNP Gene X Gene X A 2x expression A A Gene X G G 1x expression G

  28. CNVs and SNPs mostly capture different effects • Relative impact on gene expression: 82% SNPs 18% CNVs • Only 13% of genes with CNV association also had a SNP association in the same population • biased toward large effect size. • CNV and SNP variation are highly correlated (p-value 0.001). • Lack of overlapping effects is not due to CNVs in regions of segmental duplications (few HapMap SNPs). • Percentage of associated clones overlapping SDs does not differ from all clones overlapping SDs (p-value: 0.016). • Also, the probability that a CNV signal is captured by SNPs does not depend on whether the CNV is in a SD (17.3%) or outside of SDs (15.9%).

  29. Phase II HapMap (2.2m SNPs)

  30. Direction of allelic effect POP2 POP1 AGREEMENT OPPOSITE

  31. Direction of allelic effects 95% have the same direction

  32. Trans effects mirnaSNPs spliceSNPs rSNPs nsSNPs GENE REG DNA Genome-wide associations Dissect regulatory networks

  33. Regulatory variants have the highest impact on regulatory networks

  34. Conclusions • Large number of genes with significant expression variation within and between human population samples and strong association between individual genes and specific SNPs and CNVs. • Little overlap between SNP and CNV signals • Replication of significant signals across populations. • Promising approach for identification of functionally variable regulatory regions. • Cis regulatory variation mostly responsible for genome-wide regulatory variation

  35. Pre-publication data releasewww.sanger.ac.uk/genevar/

  36. Acknowledgements Cambridge University Mark Dunning Simon Tavaré Barbara Stranger Matthew Forrest Catherine Ingle Antigone Dimas Christine Bird Alexandra Nica Claude Beazley Panos Deloukas Cornell University Andy Clark illumina Jill Orwick Mark Gibbs Genome Structural Variation Consortium Matt Hurles, Richard Redon, Nigel Carter, Charles Lee, Chris Tyler-Smith, Stephen Scherer, The HapMap Consortium Wellcome Trust for funding

  37. Wellcome Trust Advanced Courses Working with the HapMap 2-5 April 2007 Closing date for applications: 10 January 2007 Wellcome Trust Genome Campus, Hinxton, Cambridge This 4-day residential workshop will provide a comprehensive overview of the International HapMap Project, including practical experience of working with the HapMap data to map phenotypic traits to locations in the human genome. Theoretical lectures will be combined with hands-on practical sessions and introduction to relevant databases and tools. Course instructors: Paul de Bakker (MIT), Manolis Dermitzakis (Sanger Institute), Mike Feolo (NIH/NCBI), Jonathan Marchini (Oxford University), Gil McVean (Oxford University), Steve Sherry (NIH/NCBI), Albert Vernon Smith (CSHL), Barbara Stranger (Sanger Institute), Eleftheria Zeggini (Wellcome Trust Center for Human Genetics) Speakers: Lon Cardon (Wellcome Trust Center for Human Genetics), Panos Deloukas (Sanger Institute), John Todd (Cambridge University) Full information and application details at: www.wellcome.ac.uk/advancedcourses

More Related