150 likes | 172 Views
MCReg presents a Monte-Carlo Regression based Algorithm for accurate isoform frequency estimation from RNA-Seq data, enhancing transcript-level gene expression analysis. Experimental results demonstrate superior frequency estimation accuracy compared to existing methods.
E N D
Monte-Carlo Regression Algorithm for Isoform Frequency Estimation from RNA-Seq Data Alex Zelikovsky Department of Computer Science Georgia State University Joint work with Adrian Caciula (GSU), SergheiMangul (UCLA) James Lindsay, Ion Mandoiu (UCONN) IEEE ICCABS 2013, New Orleans, LA
Outline • RNA-Seq: Introduction • MCReg: Monte Carlo Regression based Algorithm • Experimental Results • Conclusions and Future Work IEEE ICCABS 2013, New Orleans, LA
Genome-Guided RNA-Seq ProtocolRNA-Seq enables transcript-level resolution of gene expression From RNA – through the process of hybridization- Make cDNA & shatter into fragments Sequence fragment ends Map reads to genome A B C D E Isoform Expression (IE) Isoform Discovery (ID) Gene Expression (GE) A B C A C D E IEEE ICCABS 2013, New Orleans, LA [Nicolae, et. al., 11]
Outline • RNA-Seq: Introduction • MCReg: Monte Carlo Regression based Algorithm • Observed Read Distribution • MC-Based Estimation of Expected Read Distribution • Regression-Based Estimation of Isoform Frequencies • Experimental Results • Conclusions and Future Work IEEE ICCABS 2013, New Orleans, LA
MCReg: Monte-Carlo Regression MCReg Motivation: Reducing the error rate is critical for detecting similar transcripts especially in those cases when one is a subset of another: Screenshot from Genome browse: IEEE ICCABS 2013, New Orleans, LA
General Method Overview • Map paired-end reads onto the library of known isoforms using an ungapped aligner (e.g., Bowtie) • B. Langmead, C. Trapnell, et. al., “Ultrafast and memory-efficient alignment of short DNA sequences to the human genome,” Genome Biology, vol. 10, no. 3, p. R25, 2009. • Group reads that have been mapped to the same transcripts into classes • Monte-Carlo-Based Estimation of Expected Read Distribution using e.g. Grinder simulator • F.E. Angly et. al. Grinder: a versatile amplicon and shotgun sequence simulator. Nucleic acids research, 2012 • Solve the regression:The least-square formulation can be solved with a constrained quadratic programming solver • M. S. Andersen et. al. CVXOPT: A Python package for convex optimization, Available at cvxopt.org, 2013.
Observed Read Distribution IEEE ICCABS 2013, New Orleans, LA
Monte-Carlo-Based Estimation of Expected Read Distribution IEEE ICCABS 2013, New Orleans, LA
MC-Based Estimation of Expected Read Distribution IEEE ICCABS 2013, New Orleans, LA
Regression-Based Estimation of Isoform Frequencies IEEE ICCABS 2013, New Orleans, LA
Regression-Based Estimation of Isoform Frequencies IEEE ICCABS 2013, New Orleans, LA
Outline • RNA-Seq: Introduction • MCReg: Monte Carlo Regression based Algorithm • Experimental Results • Conclusions and Future Work IEEE ICCABS 2013, New Orleans, LA
Simulation Setup IEEE ICCABS 2013, New Orleans, LA
Experimental Results Frequency estimation accuracy was assessed using the coefficient of determination r2. For IsoEM r2 = 0.92, while for MCReg r2 = 0.97. The results shows better correlation compared with IsoEM especially because of those cases of sub-transcripts where IsoEM skewed the estimated frequency toward super-transcripts. IEEE ICCABS 2013, New Orleans, LA
Thanks! IEEE ICCABS 2013, New Orleans, LA