380 likes | 529 Views
Accounting for Non-Genetic Factors Improves the Power of eQTL Studies John Winn with Oliver Stegle , Leopold Parts, Anitha Kannan and Richard Durbin Acknowledgements: B arbara Stranger and Manolis Dermitzakis for access to their gene expression data.
E N D
Accounting for Non-Genetic Factors Improves the Power of eQTL Studies John Winnwith Oliver Stegle, Leopold Parts, Anitha Kannan and Richard Durbin Acknowledgements:Barbara Stranger and Manolis Dermitzakis for access to their gene expression data University of Manchester, 17th June 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAA
The Challenge: Gene function Phase 1: Human Genome project
Genetic causes of common diseases • Finding genetic causes for cancer, diabetes, heart disease, obesity etc. is challenging as depends on variations in multiple genes, almost all with weak effects. • Association studiesInvestigate differences in variation across the entire genome of diseased and control persons. Gene expression can also be used as an intermediary.
SNPs …TAAGTGACTAGATGATTACATGAGACTACTATGA… …TAAATGACTAGATGACTACATGAGAGTACTATGA… …TAAGTGACTAGATGATTACATGAGAGTACTATGA… …TAAATGACTAGATGACTACATGAGACTACTATGA… … Loci (position in genome) Data: HapMapSNPs • 270 individuals from 3 populations • 3.1 million Single Nucleotide Polymorphisms (SNPS) genome-wide Individuals Gene A Gene B Haplotypes
Data: HapMap microarray • Gene expression profile for all HapMap individuals • 270 individuals • 40 000 probes • Epstein-Barr virus–transformed lymphoblastoid cell lines (Stranger et al., Nature genetics 2007) Example of 40.000 probe spotted micorarray
Overview of approach haplotype/ expression association Haplotype model environmental factors haplotype SNP data non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel expression /disease association phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria
Progress so far Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic + non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria
Haplotype modelling Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic + non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria
Loci Recombination hotspot Individual haplotypes … Haplotype block … Haplotype model: overview Ancestral haplotypes
sk+1 sk s1 sN mk+1 xN, xk+1, x1, xk, Rk+1 R1 Rk Rn π mN m1 mk Full haplotype model with phasing Recombination Pedigree transition Ancestral indices Phase Genotype Ancestral library τn τk+1 τ k τ1 j=1:J … … … tN … t1 tk tk+1 tN … … y1 yk Yk+1 YN
Example results Probability of Recombination Ancestral usage for each individual Individuals Population structure Haplotype blocks Individuals
Association modelling Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic +non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria
Expression Quantitative Trait Loci (eQTL) SNP data ACTAAGTGACTAGACTACATGAGACTACTATGAC ACTAAATGACTAGACTACATGAGAGTACTAGGAC ACTAAGTGACTAGACTACATGAGAGTACTATGAC ACTAAATGACTAGGCTACATGAGACTACTACGAC ACTAAGTGACTAGGCTACATGAGACTACTATGAC ACTAAGTGACTAGGCTACATGAGAGTACTATGAC individuals • Goal of eQTL: • Identification of relations between SNPs and the expression profile Relations individuals Gene expression profile
eQTL challenges Association between genetic variation and expression challenging because • Vast number of potential associations, relatively little data • Unmeasured non-genetic influences on expression levels e.g. environmental, developmental. • SNP measurements exhibit linkage • False associations induced by population structure
Overview Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks
eQTL Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks
Standard eQTL Model for a single SNP LOD scores • Log-odds of P(bg), per gene yg and SNP s • Permutation testing to assess significance of associations
FA-eQTL Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks
FA-eQTL model • The algorithm effectively performs eQTL on the residuals of the non-genetic factor model
Experimental results Top relations in Chromosome 2,7,11, X
Experimental results • LOD-score of almost all associations increase • Three times as many significant associations than for eQTL(2% false discovery rate) • Most associations found are cis Number of associations 2% False Discovery Rate
Block FA-eQTL Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks
Block model sensitivity • Exploiting block model also increases sensitivity • Blocking significantly reduces number of tests Number of associations 2% False Discovery Rate
Conclusions • When associating human genetic variation with gene expression/disease, must account for: • Haplotype block structure • Population structure • Non-genetic sources of variation • Future directions • New datasets with more individuals and more disease information • Joint model of expression, proteins, pathways and disease to improve understanding of disease mechanisms • Parallelised genome-wide processing
Co-expression Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic +non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria
Modelling Co-expression SNP at different loci non-genetic causes Genetic factors Non-genetic factors sj x wg θ × × genetic weight vector non-genetic weight vector ygj noise gene expression levels Individuals 1..J Genes 1..G
Modelling Co-expression SNP at different loci non-genetic causes Genetic factors Non-genetic factors sj x wg θ × × genetic weight vector non-genetic weight Matrix zkj ygj noise gene expression levels Individuals 1..J Factors 1..K Genes 1..G
Modelling Co-expression • Non-genetic factors: broad effects • non-sparse • Genetic factors: Heavy tailed prior on mixing matrix • sparse п A C=1..2 zjk co-expression factors C θ × FA weight matrix Factors 1..K yjg gene expression levels Individuals 1..J Genes 1..G
Experimental Results on Yeast • Brem Yeast, 6000 genes, 3000 genetic markers • FA-eQTL 0.01% FPR
Experimental Results on Yeast II • Brem Yeast, 6000 genes, 3000 genetic markers • FA-eQTL + co-expression 0.01% FPR
Evaluation of non-genetic factor models • Evaluation based on predictive performance • Intuition: models that capture long-range correlations explain non-genetic effects From 4 training/test splits (75%)
Relation finding with Haplotype blocks colour coded: ancestral indices individuals SNPs Block boundaries • Genetic variations due to mutation and recombination • Recombination Hotspots • Haplotype Block model • ancestral library • block-boundaries