1 / 38

University of Manchester, 17 th June 2008

Accounting for Non-Genetic Factors Improves the Power of eQTL Studies John Winn with Oliver Stegle , Leopold Parts, Anitha Kannan and Richard Durbin Acknowledgements: B arbara Stranger and Manolis Dermitzakis for access to their gene expression data.

aloha
Download Presentation

University of Manchester, 17 th June 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accounting for Non-Genetic Factors Improves the Power of eQTL Studies John Winnwith Oliver Stegle, Leopold Parts, Anitha Kannan and Richard Durbin Acknowledgements:Barbara Stranger and Manolis Dermitzakis for access to their gene expression data University of Manchester, 17th June 2008 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAA

  2. The Challenge: Gene function Phase 1: Human Genome project

  3. Genetic causes of common diseases • Finding genetic causes for cancer, diabetes, heart disease, obesity etc. is challenging as depends on variations in multiple genes, almost all with weak effects. • Association studiesInvestigate differences in variation across the entire genome of diseased and control persons. Gene expression can also be used as an intermediary.

  4. SNPs …TAAGTGACTAGATGATTACATGAGACTACTATGA… …TAAATGACTAGATGACTACATGAGAGTACTATGA… …TAAGTGACTAGATGATTACATGAGAGTACTATGA… …TAAATGACTAGATGACTACATGAGACTACTATGA… … Loci (position in genome) Data: HapMapSNPs • 270 individuals from 3 populations • 3.1 million Single Nucleotide Polymorphisms (SNPS) genome-wide Individuals Gene A Gene B Haplotypes

  5. Data: HapMap microarray • Gene expression profile for all HapMap individuals • 270 individuals • 40 000 probes • Epstein-Barr virus–transformed lymphoblastoid cell lines (Stranger et al., Nature genetics 2007) Example of 40.000 probe spotted micorarray

  6. Overview of approach haplotype/ expression association Haplotype model environmental factors haplotype SNP data non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel expression /disease association phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria

  7. Progress so far Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic + non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria

  8. Haplotype modelling Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic + non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria

  9. Loci Recombination hotspot Individual haplotypes … Haplotype block … Haplotype model: overview Ancestral haplotypes

  10. On real data

  11. sk+1 sk s1 sN mk+1 xN, xk+1, x1, xk, Rk+1 R1 Rk Rn π mN m1 mk Full haplotype model with phasing Recombination Pedigree transition Ancestral indices Phase Genotype Ancestral library τn τk+1 τ k τ1 j=1:J … … … tN … t1 tk tk+1 tN … … y1 yk Yk+1 YN

  12. Example results Probability of Recombination Ancestral usage for each individual Individuals Population structure Haplotype blocks Individuals

  13. Association modelling Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic +non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria

  14. Expression Quantitative Trait Loci (eQTL) SNP data ACTAAGTGACTAGACTACATGAGACTACTATGAC ACTAAATGACTAGACTACATGAGAGTACTAGGAC ACTAAGTGACTAGACTACATGAGAGTACTATGAC ACTAAATGACTAGGCTACATGAGACTACTACGAC ACTAAGTGACTAGGCTACATGAGACTACTATGAC ACTAAGTGACTAGGCTACATGAGAGTACTATGAC individuals • Goal of eQTL: • Identification of relations between SNPs and the expression profile Relations individuals Gene expression profile

  15. eQTL challenges Association between genetic variation and expression challenging because • Vast number of potential associations, relatively little data • Unmeasured non-genetic influences on expression levels e.g. environmental, developmental. • SNP measurements exhibit linkage • False associations induced by population structure

  16. Overview Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks

  17. eQTL Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks

  18. Standard eQTL Model for a single SNP LOD scores • Log-odds of P(bg), per gene yg and SNP s • Permutation testing to assess significance of associations

  19. FA-eQTL Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks

  20. FA-eQTL model

  21. FA-eQTL model • The algorithm effectively performs eQTL on the residuals of the non-genetic factor model

  22. Experimental results Top relations in Chromosome 2,7,11, X

  23. Experimental results • LOD-score of almost all associations increase • Three times as many significant associations than for eQTL(2% false discovery rate) • Most associations found are cis Number of associations 2% False Discovery Rate

  24. Validation of results

  25. Block FA-eQTL Standard eQTL FA-eQTLaccounting for non-genetic factors FA-eQTL on haplotype blocks

  26. Block FA-eQTL

  27. Block model sensitivity • Exploiting block model also increases sensitivity • Blocking significantly reduces number of tests Number of associations 2% False Discovery Rate

  28. Conclusions • When associating human genetic variation with gene expression/disease, must account for: • Haplotype block structure • Population structure • Non-genetic sources of variation • Future directions • New datasets with more individuals and more disease information • Joint model of expression, proteins, pathways and disease to improve understanding of disease mechanisms • Parallelised genome-wide processing

  29. Co-expression modelling

  30. Co-expression Haplotype model Block and population structure model environmental factors haplotype SNP data Genetic +non-genetic association model non-coding coding protein activityof gene expression levelof gene microarray measurements Interactionmodel Gene co-expression model phenotype e.g. disease T1D, T2D, Obesity,G2C, Cancer, Malaria

  31. Modelling Co-expression SNP at different loci non-genetic causes Genetic factors Non-genetic factors sj x wg θ × × genetic weight vector non-genetic weight vector ygj noise gene expression levels Individuals 1..J Genes 1..G

  32. Modelling Co-expression SNP at different loci non-genetic causes Genetic factors Non-genetic factors sj x wg θ × × genetic weight vector non-genetic weight Matrix zkj ygj noise gene expression levels Individuals 1..J Factors 1..K Genes 1..G

  33. Modelling Co-expression • Non-genetic factors: broad effects • non-sparse • Genetic factors: Heavy tailed prior on mixing matrix • sparse п A C=1..2 zjk co-expression factors C θ × FA weight matrix Factors 1..K yjg gene expression levels Individuals 1..J Genes 1..G

  34. Experimental Results on Yeast • Brem Yeast, 6000 genes, 3000 genetic markers • FA-eQTL 0.01% FPR

  35. Experimental Results on Yeast II • Brem Yeast, 6000 genes, 3000 genetic markers • FA-eQTL + co-expression 0.01% FPR

  36. Backup Slides

  37. Evaluation of non-genetic factor models • Evaluation based on predictive performance • Intuition: models that capture long-range correlations explain non-genetic effects From 4 training/test splits (75%)

  38. Relation finding with Haplotype blocks colour coded: ancestral indices individuals SNPs Block boundaries • Genetic variations due to mutation and recombination • Recombination Hotspots • Haplotype Block model • ancestral library • block-boundaries

More Related