180 likes | 328 Views
Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing. Jorge Duitama 1 , Ion Mandoiu 1 , and Pramod Srivastava 2. 1 University of Connecticut. Department of Computer Sciences & Engineering 2 University of Connecticut Health Center.
E N D
Bioinformatics pipeline for detection of immunogenic cancer mutations by high throughput mRNA sequencing Jorge Duitama1, Ion Mandoiu1, and Pramod Srivastava2 1 University of Connecticut. Department of Computer Sciences & Engineering 2 University of Connecticut Health Center
Immunology Background J.W. Yedell, E Reits and J Neefjes. Making sense of mass destruction: quantitating MHC class I antigen presentation. Nature Reviews Immunology, 3:952-961, 2003
Cancer Immunotherapy Peptides Synthesis Tumor mRNA Sequencing Tumor Specific Epitopes Discovery CTCAATTGATGAAATTGTTCTGAAACT GCAGAGATAGCTAAAGGATACCGGGTT CCGGTATCCTTTAGCTATCTCTGCCTC CTGACACCATCTGTGTGGGCTACCATG … AGGCAAGCTCATGGCCAAATCATGAGA Immune System Training Tumor Remission SYFPEITHI ISETDLSLL CALRRNESL … Mouse Image Source: http://www.clker.com/clipart-simple-cartoon-mouse-2.html
CCDS mapped reads CCDS Mapping Tumor mRNA (PE) reads Read merging Genome Mapping Genome mapped reads Analysis Pipeline Mapped reads Variants detection Tumor-specific mutations Tumor-specific CTL epitopes Gene fusion & novel transcript detection Epitopes Prediction Unmapped reads
Variant Calling Methods • Binomial: Test used in e.g. [Levi et al 07, Wheeler et al 08] for calling SNPs from genomic DNA • Posterior: Picks the genotype with best posterior probability given the reads, assuming uniform priors
Epitopes Prediction • Predictions include MHC binding, TAP transport efficiency, and proteasomal cleavage C. Lundegaard et al. MHC Class I Epitope Binding Prediction Trained on Small Data Sets. In Lecture Notes in Computer Science, 3239:217-225, 2004
Accuracy Assessment of Variants Detection • 63 million Illumina mRNA reads generated from blood cell tissue of Hapmap individual NA12878 (NCBI SRA database accession number SRX000566) • We selected Hapmap SNPs in known exons for which there was at least one mapped read by any method (22,362 homozygous reference, 7,893 heterozygous or homozygous variant) • True positives: called variants for which Hapmap genotype is heterozygous or homozygous variant • False positives: called variants for which Hapmap genotype is homozygous reference
Comparison of Variant Calling Strategies Genome Mapping, Alt. coverage 1
Comparison of Variant Calling Strategies Genome Mapping, Alt. coverage 3
Comparison of Mapping Strategies Posterior , Alt. coverage 3
Results on Meth A Reads • 6.75 million Illumina reads from mRNA isolated from a mouse cancer tumor cell line • We found 6775 variant candidates after hard merge mapping, posterior variant calling and a minimum of three reads per alternative allele • 934 variants produced 1439 epitopes with SYFPEITHI score higher than 15 for the mutated peptide
Distribution of SYFPEITHI Score Differences Between Mutated and Reference Peptides
Validation Results • Mutations reported by [Noguchi et al 94] were found by this pipeline • We are performing Sanger sequencing of PCR amplicons to confirm reported mutations • We are using mass spectrometry for confirmation of presentation of epitopes in the surface of the cell
Conclusions & Ongoing Work • We implemented a bioinformatics pipeline for characterizing tumor immunomes • Identified tumor epitopes are being evaluated for therapeutic effect • Ongoing work: • Detect short immunogenic indels and novel transcripts • Include predictions of TAP transport efficiency, and proteasomal cleavage • Refine the posterior method to increase mutation detection robustness in the presence of differential allelic expression
Acknowledgments • Brent Graveley and Duan Fei (UCHC) • NSF awards IIS-0546457, IIS-0916948, and DBI-0543365 • UCONN Research Foundation UCIG grant
Questions? • Thanks