Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Proteomic Characterization of Alternative Splicing and Coding Polymorphism Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland, College Park

Why don’t we see more novel peptides? • Tandem mass spectrometry doesn’t discriminate against novel peptides......but protein sequence databases do! • Searching traditional protein sequence databases biases the results towards well-understood protein isoforms!

What goes missing? • Known coding SNPs • Novel coding mutations • Alternative splicing isoforms • Alternative translation start-sites • Microexons • Alternative translation frames

Why should we care? • Alternative splicing is the norm! • Only 20-25K human genes • Each gene makes many proteins • Proteins have clinical implications • Biomarker discovery • Evidence for SNPs and alternative splicing stops with transcription • Genomic assays, ESTs, mRNA sequence. • Little hard evidence for translation start site

Novel Splice Isoform • Human Jurkat leukemia cell-line • Lipid-raft extraction protocol, targeting T cells • von Haller, et al. MCP 2003. • LIME1 gene: • LCK interacting transmembrane adaptor 1 • LCK gene: • Leukocyte-specific protein tyrosine kinase • Proto-oncogene • Chromosomal aberration involving LCK in leukemias. • Multiple significant peptide identifications

Novel Splice Isoform

Novel Mutation • HUPO Plasma Proteome Project • Pooled samples from 10 male & 10 female healthy Chinese subjects • Plasma/EDTA sample protocol • Li, et al. Proteomics 2005. (Lab 29) • TTR gene • Transthyretin (pre-albumin) • Defects in TTR are a cause of amyloidosis. • Familial amyloidotic polyneuropathy • late-onset, dominant inheritance

Novel Mutation Ala2→Pro associated with familial amyloid polyneuropathy

Novel Mutation

Pros No introns! Primary splicing evidence for annotation pipelines Evidence for dbSNP Often derived from clinical cancer samples Cons No frame Large (8Gb) “Untrusted” by annotation pipelines Highly redundant Nucleotide error rate ~ 1% Searching Expressed Sequence Tags (ESTs)

Compressed EST Peptide Sequence Database • For all ESTs mapped to a UniGene gene: • Six-frame translation • Eliminate ORFs < 30 amino-acids • Eliminate amino-acid 30-mers observed once • Compress to C2 FASTA database • Complete, Correct for amino-acid 30-mers • Gene-centric peptide sequence database: • Size: 223 Mb vs 8 Gb, 20774 FASTA entries • Running time: 15 mins vs 22 hours • E-values: 50-fold reduction • Download: • http://www.umiacs.umd.edu/~nedwards

Back to the lab... • Current LC/MS/MS workflows identify a few peptides per protein • ...not sufficient for protein isoforms • Need to raise the sequence coverage to (say) 80% • ...protein separation prior to LC/MS/MS analysis

Future informatics directions... • Combine results from multiple searches from multiple engines • Fast, automated triage of “significant false-positive” peptide identifications • Compressed EST peptide sequence database for other species • Mouse, Rat, Zebrafish, Chicken, Cow, A. thaliana, ?? • Relational database and web-application infrastructure • Interactive browser data-grid, flexible web-services export • Java Applet MS/MS viewers, GFF for Genome Browser

Conclusions • Peptides identify more than just proteins • Untapped source of disease biomarkers • Functional vs silencing variants • Compressed peptide sequence databases make routine EST searching feasible • Statistically significant peptide identification is only the first step

Acknowledgements • Catherine Fenselau, Steve Swatkoski • UMCP Biochemistry • Chau-Wen Tseng, Xue Wu • UMCP Computer Science • Cheng Lee • Calibrant Biosystems • PeptideAtlas, HUPO PPP, X!Tandem • Funding: NCI

Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Presentation Transcript

ALTERNATIVE SPLICING OF mRNA.

MGAlign and Alternative Splicing

Protein Intrinsic Disorder, Cell Signaling and Alternative Splicing

Alternative Splicing

Figure 2.26 Some examples of alternative RNA splicing

Alternative Splicing from ESTs

Constructions and Applications of Alternative Splicing Databases

Discovery of Alternative Splicing

Regulation of Alternative Splicing

Alternative Splicing of the PKC δ Gene

The Influence of Alternative Splicing in Protein Structure

Alternative splicing: A playground of evolution

Alternative splicing: A playground of evolution

Alternative splicing: A playground of evolution

HIV-1 Alternative splicing

Discovery the Relationship Between Single Nucleotide Polymorphism and Alternative Splicing events

Alternative splicing: A playground of evolution

Proteomic

Alternative Splicing

Proteomic Characterization of Alternative Splicing and Coding Polymorphism

Constructions and Applications of Alternative Splicing Databases

V23 Regular vs. alternative splicing