1 / 26

Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data

Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data. Marius Nicolae and Ion M ă ndoiu (University of Connecticut, USA). Outline. DGE/SAGE- Seq protocol EM algorithm Experimental results Conclusions. RNA- Seq Protocol.

keefer
Download Presentation

Accurate Estimation of Gene Expression Levels from Digital Gene Expression Sequencing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accurate Estimation of Gene Expression Levelsfrom Digital Gene Expression Sequencing Data Marius Nicolae and Ion Măndoiu (University of Connecticut, USA)

  2. Outline • DGE/SAGE-Seq protocol • EM algorithm • Experimental results • Conclusions

  3. RNA-Seq Protocol Make cDNA & shatter into fragments Sequence fragment ends Map reads A B C D E Isoform Expression (IE) Gene Expression (GE) Isoform Discovery (ID) A B C A C D E

  4. DGE Protocol AAAAA Cleave with anchoring enzyme (AE) AAAAA CATG CATG CATG AAAAA AE AE Attach primer for tagging enzyme (TE) TCCRAC TE Cleave with tagging enzyme Map tags Gene Expression (GE) A B C D E

  5. Our Approach Previous methods • Discard ambiguous tags [Asmann et al. 09, Zaretzki et al. 10] • Heuristics to rescue some ambiguous tags [Wu et al. 10] New DGE-EMalgorithm • Uses all tags, including all ambiguous ones • Uses quality scores • Takes into account partial digest and gene isoforms

  6. Tag Formation Probability

  7. Tag-Isoform Compatibility

  8. DGE-EM Algorithm assign random values to allf(i) while not converged init all n(i,j)to0 for each tagt for (i,j,w) in t E-step for each isoformi M-step

  9. MAQC Data (UHRR, HBRR) DGE • 9 Illumina libraries, 238M 20bp tags [Asmann et al. 09] • Anchoring enzyme DpnII (GATC) RNA-Seq • 6 libraries, 47-92M 35bp reads each [Bullard et al. 10] qPCR • Quadruplicate measurements for 832 Ensembl genes [MAQC Consortium 06]

  10. Compared Algorithms DGE • Uniq [Asmann et al. 09, Zaretzki et al. 10] • DGE-EM RNA-Seq • IsoEM [Nicolae et al. 10] • Cufflinks [Trapnell et al. 10]

  11. DGE-EM vs. Uniq on HBRR Library 4

  12. DGE vs. RNA-Seq

  13. DGE vs. RNA-Seq

  14. DGE vs. RNA-Seq

  15. 1-30M tags, lengths 14-26bp UCSC hg19 genome and known isoforms Simulated expression levels Gene expression for 5 tissues from the GNFAtlas2 Geometric expression for the isoforms of each gene Anchoring enzymes from REBASE DpnII (GATC) [Asmann et al. 09] NlaIII (CATG) [Wu et al. 10] CviJI (RGCY, R=G or A, Y=C or T) Synthetic Data

  16. MPEfor 30M 21bp tags RNA-Seq: 8.3 MPE

  17. Conclusions Introduced new DGE-EM algorithm Improves accuracy over previous methods by using ambiguous tags and considering isoforms and partial digestion Source code freely availabe at http://www.dna.engr.uconn.edu/software/DGE-EM First direct comparison of RNA-Seq and DGE protocols Best inference algorithms yield comparable cost-normalized accuracy on MAQC data Simulations suggest possible DGE protocol improvements Enzymes with degenerate recognition sites (e.g. CviJI) Optimizing cutting probability

  18. Questions? ACKNOWLEDGEMENTS Work supported in part by NSF awards IIS-0546457 and IIS-0916948

  19. Anchoring Enzyme Statistics

  20. RNA-Seq

  21. DGE enzyme GATC p=1.0

  22. DGE enzyme CATG p=1.0

  23. DGE enzyme RGCY p=1.0

  24. DGE enzyme GATC p=.5

  25. DGE enzyme CATG p=.5

  26. DGE enzyme RGCY p=.5

More Related