440 likes | 526 Views
Regulatory variation and its functional consequences. Chris Cotsapas cotsapas@broadinstitute.org. Motivating questions. How do phenotypes vary across individuals? Regulatory changes drive cellular and organismal traits Likely also drive evolutionary differences
E N D
Regulatory variation and its functional consequences Chris Cotsapas cotsapas@broadinstitute.org
Motivating questions • How do phenotypes vary across individuals? • Regulatory changes drive cellular and organismal traits • Likely also drive evolutionary differences • How are genes (co)regulated? • Pathways, processes, contexts
Regulatory variation • What do “interesting” variants do? • Genetic changes to: • Coding sequence ** • Gene expression levels • Splice isomer levels • Methylation patterns • Chromatin accessibility • Transcription factor binding kinetics • Cell signaling • Protein-protein interactions ~88% of GWAS hits are regulatory
Genetic variation alters regulation • Protein levels • Maize (Damerval 94) • Expression levels • Yeast, maize, mouse, humans (Brem 02, Schadt03, Stranger 05, Stranger 07) • RNA splicing • Humans (Pickrell 12, Lappalainen 13) • Methylation and Dnase I peak strength • Humans (Degner 12; Gibbs 12)
Genetics of gene expression (eQTL) • cis-eQTL • The position of the eQTLmaps near the physical position of the gene. • Promoter polymorphism? • Insertion/Deletion? • Methylation, chromatin conformation? • trans-eQTL • The position of the eQTLdoes not map near the physical position of the gene. • Regulator? • Direct or indirect? Modified from Cheung and Spielman 2009 Nat Gen
Cis- eQTL analysis: Test SNPs within a pre-defined distance of gene 1Mb 1Mb 1Mb window probe gene SNPs
QT association • Analysis of the relationship between a dependent or outcome variable (phenotype) with one or more independent or predictor variables (SNP genotype) Slope: b1 Linear Regression Equation Yi =b0+b1Xi+ei b0 Continuous Trait Value Logistic Regression Equation 0 1 2 pi ln( ) Number of A1 Alleles =b0+b1Xi+ei (1-pi)
gene 4 gene 1 eQTL analysis: a GWAS for every gene gene 2 gene 3 gene N gene 5
cis-eQTLs are rather common Nica et al PLoS Genet 2011
Cis-eQTLs cluster around TSS Stranger et al PLoS Genet 2012
transhotspots (yeast) Brem et al Science 2002
Candidate genes, perturbations underlying organismal phenotypes does regulatory variation alter phenotype? Application to GWAS
Rationale • How do disease/trait variants actually alter biology? • If they change regulation, then: • Change in gene expression/isoform use • Phenotypic consequence*
Compare patterns of association GWAS peak eQTL for gene 1 eQTL for gene 2
Pearson’s covariance for windows of 51 SNPs between –log(p) in 2 traits CD GWAS p eQTL p No peak when there are independent hits near each other Detect a peak when effect is the same
Crohn’s/eQTL analysis • CD meta analysis (GWAS only) • CEU Hapmap LCL eQTL data • Overlapping SNPs only (eQTL data has 610K SNPs, most in CD meta-analysis) • Test 133 associations (total 1054 tests) GWAS peak eQTL for gene 1 eQTL for gene 2
Crohn’s/eQTL analysis A peak implies that the same effect drives GWAS and eQTL
MS/eQTL analysis A peak implies that the same effect drives GWAS and eQTL
Open question Does regvar reveal co-regulation? A.K.A. Where are the trans eQTLS?
gene 4 gene 1 Whole-genome eQTL analysis is an independent GWAS for expression of each gene gene 2 gene 3 gene N gene 5
Issues with trans mapping • Power • Genome-wide significance is 5e-8 • Multiple testing on ~20K genes • Sample sizes clearly inadequate • Data structure • Bias corrections deflate variance • Non-normal distributions • Sample sizes • Far too small
But… • Assume that transeQTLs affect many genes… • …and you can use cross-trait methods!
Cross-phenotype meta-analysis L(data | λ≠1) SCPMA ~ L(data | λ=1) Cotsapas et al, PLoS Genetics
CPMA for correlated traits • Empirical assessment to account for correlation • Simulate Z scores under covariance, recalculate CPMA • Construct distribution of CPMA for dataset, call significance with Ben Voight, U Penn
Experimental design CEU CPMA scores 610,180 SNPs MAF >0.15 CEU and YRI LD pruned (r2 < 0.2) CEU p-values Transcript ~ SNP, sex plink CPMA YRI CPMA scores >95%ile sim CPMA YRI p-values Transcript ~ SNP, sex 8368 transcripts Detectable on Illumina arrays 108 CEU individuals* 109 YRI individuals* * Stranger et al Nat Genet 2007 (LCL data; publicly available)
Target sets of genes • trans-acting variant: SNP with CPMA evidence • Target genes: genes affected by trans-acting variant (i.e. regulon)
Prediction 1 • Allelic effects should be conserved between two populations • Binomial test on paired observations for all genes P < 0.05 in at least one population Genes pCEU < 0.05 Genes pYRI < 0.05 True for 1124/1311 SNPs (binomial p < 0.05)
Prediction 2 • Target genes should overlap • Identify by mixture of gaussians classification • Empirical p from distribution of overlaps between NCEU and NYRI genes across SNPs. Genes pCEU < 0.05 Genes pYRI < 0.05 True for 600/1311 SNPs (empirical p < 0.05)
What about the target genes? • Regulons: • Encode proteins more connected than expected by chance www.broadinstitute.org/mpg/dapple.php Rossin et al 2011 PLoS Genetics
What about the target genes? • Regulons: • Encode proteins enriched for TF targets (ENCODE LCL data) • 24/67 filtered TFs significant • Binomial overlap test trans target genes CHiPseq LCL target genes
Summary • Regulatory variation is common • It affects gene expression levels • Likely many other types: • DNA accessibility, chromatin states • Transcript splicing, processing, turnover • Has phenotypic consequences • GWAS • Some cellular assays (not discussed here)
Open questions • Discover regulatory elements (cis) • Promoters, enhancers etc • Gene regulatory circuits (trans) • Dynamics of regulation • Splicing variation, processing, degradation • Phenotypic consequences • Cellular assays required • Tie in to organismal phenotype
RNAseq, GTEx Next-gen sequencing data
GTEx – Genotype-Tissue EXpression An NIH common fund project Current: 35 tissues from 50 donors Scale up: 20K tissues from 900 donors. Novel methods groups: 5 current + RFA
How can we make RNAseq useful? • Standard eQTLs • Montgomery et al, Pickrell et al Nature 2010 • Isoform eQTLs • Depth of sequence! • Long genes are preferentially sequenced • Abundant genes/isoforms ditto • Power!? • Mapping biases due to SNPs
RNAseq combined with other techs • Regulons: TF gene sets via CHiP/seq • Look for trans effects • Open chromatin states (Dnase I; methylation) • Find active genes • Changes in epigenetic marks correlated to RNA • Genetic effects • RNA/DNA comparisons • Simultaneous SNP detection/genotyping • RNA editing ???