1 / 26

Novel Peptide Identification using ESTs and Genomic Sequence

Novel Peptide Identification using ESTs and Genomic Sequence. Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park. Enzymatic Digest and Fractionation. Sample Preparation for Peptide Identification. Sample. +. _. Detector. Ionizer.

seanna
Download Presentation

Novel Peptide Identification using ESTs and Genomic Sequence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park

  2. Enzymatic Digest and Fractionation Sample Preparation for Peptide Identification

  3. Sample + _ Detector Ionizer Mass Analyzer Mass Spectrometer ElectronMultiplier(EM) Time-Of-Flight (TOF) Quadrapole Ion-Trap MALDI Electro-SprayIonization (ESI)

  4. Single Stage MS MS m/z

  5. Tandem Mass Spectrometry(MS/MS) m/z Precursor selection m/z

  6. Tandem Mass Spectrometry(MS/MS) Precursor selection + collision induced dissociation (CID) m/z MS/MS m/z

  7. Peptide Identification • For each (likely) peptide sequence 1. Compute fragment masses 2. Compare with spectrum 3. Retain those that match well • Peptide sequences from protein sequence databases • Swiss-Prot, IPI, NCBI’s nr, ... • Automated, high-throughput peptide identification in complex mixtures

  8. What goes missing? • Known coding SNPs • Novel coding mutations • Alternative splicing isoforms • Alternative translation start-sites • Microexons • Alternative translation frames

  9. Why should we care? • Alternative splicing is the norm! • Only 20-25K human genes • Each gene makes many proteins • Proteins have clinical implications • Biomarker discovery • Evidence for SNPs and alternative splicing stops with transcription • Genomic assays, ESTs, mRNA sequence. • Little hard evidence for translation start site

  10. Novel Splice Isoform

  11. Novel Splice Isoform

  12. Novel Frame

  13. Novel Frame

  14. Novel Mutation Ala2→Pro associated with familial amyloid polyneuropathy

  15. Novel Mutation

  16. Genomic Peptide Sequences • Genomic DNA • Exons & introns, 6 frames, large (3Gb → 6Gb) • ESTs • No introns, 6 frames, large (4Gb → 8Gb) • Used by gene, protein, and alternative splicing annotation pipelines • Highly redundant, nucleotide error rate ~ 1%

  17. Compressed EST Database • Six-frame translation of all ESTs • Optionally, ESTs that map to a gene • Eliminate ORFs < 30 amino-acids • Amino-acid 30-mers • Observed in at least two ESTs • Represent AA 30-mers in C3 FASTA database • Complete, Correct, Compact

  18. SBH-graph ACDEFGI, ACDEFACG, DEFGEFGI

  19. Compressed SBH-graph ACDEFGI, ACDEFACG, DEFGEFGI

  20. Sequence Databases & CSBH-graphs • Original sequences correspond to paths ACDEFGI, ACDEFACG, DEFGEFGI

  21. Sequence Databases & CSBH-graphs • All k-mers represented by an edge have the same count 1 2 2 1 2

  22. cSBH-graphs • Quickly determine those that occur twice 2 2 1 2

  23. Compressed-SBH-graph 2 2 1 2 ACDEFGI

  24. Compressed EST Database • Gene centric compressed EST peptide sequence database • 20,774 sequence entries • ~8Gb vs 223 Mb • ~35 fold compression • 22 hours becomes 15 minutes • E-values improve by similar factor! • Makes routine EST searching feasible • Search ESTs instead of IPI?

  25. Conclusions • Peptides identify more than just proteins • Compressed peptide sequence databases make routine EST searching feasible • cSBH-graph + edge counts + C2/C3 enumeration algorithms • Minimal FASTA representation of k-mer sets

  26. Collaborators • Chau-Wen Tseng, Xue Wu • Computer Science • Catherine Fenselau, Crystal Harvey • Biochemistry • Calibrant Biosystems • Thanks to PeptideAtlas, X!Tandem

More Related