120 likes | 544 Views
Scalable Algorithms for Next-Generation Sequencing Data Analysis. Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering. Next Generation Sequencing. Illumina HiSeq. Roche/454. SOLiD 5500. Ion Proton. PacBio RS. Oxford Nanopore.
E N D
Scalable Algorithms for Next-Generation Sequencing Data Analysis Ion Mandoiu UTC Associate Professor in Engineering Innovation Department of Computer Science & Engineering
Next Generation Sequencing Illumina HiSeq Roche/454 SOLiD 5500 Ion Proton PacBio RS Oxford Nanopore
Ongoing Projects • Transcriptome Analysis • Transcriptome quantification and differential expression analysis • Computational deconvolution of heterogeneous samples • Transcriptome and meta-transcriptome assembly • Viral quasispecies • Quasispecies reconstruction from NGS reads • IBV evolution and vaccine optimization • Transmission graphs • Immunoinformatics • Genomics-guided immunotherapy • Deep panning for early cancer detection • Sequencing error correction, genome assembly and scaffolding, metabolomics, biomarker selection, … • More info & software at http://dna.engr.uconn.edu
Transcriptome Quantification • IsoEM algorithm for isoform expression estimation • - Incorporates fragment length distribution, hexamer bias correction, … Ion Torrent MAQC datasets A B C A C • RNA-PhASE pipeline for allele-specific isoform expression
Differential Expression • Fast estimation enables the use of accurate bootstrapping-based methods MAQC 454 datasets UHRR SRX002934 vs HBRR SRX002935
Computational Deconvolution of Heterogeneous Samples • Goal: characterization expression of mesoderm progenitor cells • Whole-transcriptome expression data for NSB cell mixtures + single-cell qPCR data for few genes • Three step approach • Cluster of single cell qPCR data and infer “reduced” cell type signatures • Infer mixing proportions based on reduced signatures using quadratic programming • Infer full expression signatures based on mixing proportions, solving one quadratic program per gene
Reference-Guided Transcriptome Reconstruction 1 2 3 4 5 6 7 1 2 3 4 5 6 7 t1 : 1 3 4 5 6 7 t2 : 1 2 3 4 5 7 t3 : 1 3 4 5 7 t4 :
TRIP: TransciptomeReconstruction using Integer Programming • Select the smallest set of putative transcripts that yields a good statistical fit between • empirically determined during library preparation • implied by “mapping” read pairs 500 1 2 3 200 200 200 Mean : 500; Std. dev. 50 300 1 3 Mean : 500; Std. dev. 50 200 200
De Novo (Meta)Transcriptome Assembly of BugulaNeritina and its Symbiont • Uncultured bacterial symbiont produces bryostatins • - Symbiont absent in Northern Atlantic populations
De Novo (Meta)Transcriptome Assembly of BugulaNeritina and its Symbiont • Developing scalable multi-sample meta transcriptome assembly pipeline based on differential-coverage clustering of reads
Acknowledgements Sahar Al Seesi Abdul Banday Amir Bayegan Gabriel Ilie Caroline Jakuba James Lindsay Rahul Kanadia Craig Nelson Marius Nicolae Adrian Caciula Nicole Lopanik SergheiMangul Yvette TemateTiagueu Alex Zelikovsky