1 / 19

Introduction to DESeq and edgeR packages

Introduction to DESeq and edgeR packages. Peter A.C. ’ t Hoen. Poisson distribution.

kalil
Download Presentation

Introduction to DESeq and edgeR packages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to DESeq and edgeR packages Peter A.C. ’t Hoen

  2. Poisson distribution • discrete probability distribution that expresses the probability of a number of events occurring in a fixed period of time if these events occur with a known average rate and independently of the time since the last event • = expected k = number of occurrences

  3. Count process • Poisson distribution Yt~ Poisson(λt) with λt = pnt t: tag λ: true expression Y: observed expression p: probability n: total number of RNA molecules • Truncated Poisson distribution: zero can mean not expressed or not counted • Count variance ~ λt • Murray F Freeman and John W Tukey. Ann Math Statist, 21:607-611, (1950)

  4. Negative binomial distribution • discrete probability distribution of the number of successes in a sequence of Bernoulli trials before a specified (non-random) number r of failures occurs • also arises as a continuous mixture of Poisson distributions where the mixing distribution of the Poisson rate is a gamma distribution. That is, we can view the negative binomial as a Poisson(λ) distribution, where λ is itself a random variable, distributed according to Gamma(r, p/(1 − p)).

  5. edgeR (1) • Robinson, Smyth (Biostatistics, 2008; Bioinformatics 2007) • Package available from Bioconductor with very informative vignette Yij ~ NB (ij , ) Var (Yij) = ij ( 1 + ij x ) • Negative binomial (gamma Poisson) with average mu • Phi is overdispersion parameter (biological variation) •  = 0 gives Poisson distribution

  6. Overdispersion in our data

  7. edgeR (2) • Test per gene Ygij ~ NB (gij ,g ) where gij = Mj x pgj Var (Ygij) = gij ( 1 + ij x  g) pgi is proportion of tags for tag g in sample i Mj is library size for sample i and library j g is dispersion parameter for tag g

  8. edgeR (3) • Estimation of common dispersion parameter by conditioning g on the sum of counts and maximizing the common likelihood lC() =  lg(g) • Common dispersion parameter OR weighted linear combination of common and individual likelihoods WL (g) = lg(g) + lC(g)

  9. edgeR (4) • Exact test replacing hypergeometric probabilities with NB-derived probabilities (qCML) for single factor experiment • Generalized linear models and Cox-Reid profile-adjusted likelihood (CR) method for multifactorial experiments

  10. edgeR: what is new? • Exact Test not able to work with confounders  replaced by generalized linear model with log likelihood ratio test • Abundance trending in dispersion estimates

  11. Dispersion trend dispersion abundance

  12. Dispersion trending (after filtering for low ab) dispersion abundance

  13. DESeq (1) • Anders and Huber: Genome Biology (2010) 11:R106 • Roughly same principles as edgeR • No multifactorial analysis implemented yet

  14. DESeq (2) (1) Yij ~ NB (ij ,σ2ij ) (2)ij = sj qi,ρ(j) sj scaling factor for sample j qi,ρ(j) proportional concentration of tag i in condition ρ (3)σ2ij = ij + s2jνi,ρ(j) νi,ρ(j) is a smooth function depending on qi,ρ(j) (concentration) Extra variance Count noise

  15. DESeq (3): variance trend with expression Purple: Poisson Dashed orange: edgeR (before trending) Orange: DESeq You can derive: Squared CV is 1/μ + φ

  16. DESeq (3) • Differences with edgeR: • Complete shrinkage to trended dispersion; limited tagwise dispersion estimates • Different variance estimates for different sample groups allowed • Deals better with samples with large differences in read depth?

  17. DESeq (4): statistical testing • In analogy to initial edgeR implementation exact test on the NB probabilities in the two conditions

  18. Conclusions • edgeR and DESeq are comparable implementation of statistical tests using NB distribution • edgeR and DESeq produce largely similar results • Implementation of generalized linear models in edgeR allows for testing with confounders • Results comparable to limma for medium – high expressed genes: modeling of stochastic effects is particularly important for low expressed genes

  19. Comparison to limma (on sqrt scaled data)

More Related