490 likes | 667 Views
Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data. ProteoRed Bioinformatics Workshop Salamanca Alberto Medina-Aunon March, 15th 2010. Main Topics. Mass spectrometry and protein and peptide validation PRIDEViewer: Description.
E N D
Data Validation and Annotation: PRIDEViewer and PIKEBioinformatics analysis from proteomics data ProteoRed Bioinformatics Workshop Salamanca Alberto Medina-Aunon March, 15th 2010
Main Topics • Mass spectrometry and protein and peptide validation • PRIDEViewer: Description. • Examples: Uses-cases. • Experiment context: Linking functional information to our proteins. • PIKE: Description. • Examples: Uses-cases.
MS Validation. The easiest Way • Starting from: • Mass spectrum/spectra • Tentative identification/Sequence • Search Engine • Candidate: AFLLAMAARTGFRTR
How to do it • By hand: • Just for a few sequences/spectra • We cannot read every format files (for instance binaries). • Semi-automatically: • Using PRIDE files as input: PRIDEViewer
Validation study • Starting from one public proteomics repository – EBI PRIDE-: • Retrieve a set of available experiments. • Check the level of fulfillment of the experiments. • Repeat the protein and peptide identification. VALIDATE THE EXPERIMENT……..
Validation – First Round:PRIDE Accession 1642 Why? If we explore the data, we’ll find ….. Delta mass around 32Da
Validation – First Round:Pride Accession 1642 • Hypothesis…. • First and third sequences present a mass variation around 32 Da. • Is there a modification in C or N termini? In that way, second sequence will present as well. • Is any residue -or more than one- modified? • We’ll extract the common aminoacids: D, A, S, I, C, M and G • Compare they with the described modifications with a mass variation of 32 Da.
Validation – First Round:PRIDE Accession 1642. Only this modification could explain a common property between both sequences. So, we’ll select it in the next round
Validation – Second Round: Latest Experiments. Retrieved by hand
Validation – Second Round:Latest experiments • PRIDE accession id: 10470 to 11257 (787 experiments). • No one is suitable to check. • No information regarding the identification is available. • PRIDE accession id: 10000 to 10074 (74 experiments). • One dataset could be checked: 10042 to 10060. (Dataset title: Low abundance proteome of human red blood cells captured by combinatorial peptide libraries)
Validation – Third Round: Recent Experiments. Retrieved by hand • Experiment id: 9900 to 9999 • Two dataset are suitable to check: • 9900 to 9942: LC-MALDI experiments (Tannerella forsythia). • 9944 to 9949: Rattus norvegicus. • 9984: Zebrafish. No spectra. • 9985 to 9992: Homo sapiens. (No identifications). • 44 not available.
Study summary • Around 1000 PRIDE experiments were downloaded from PRIDE central repository. • Around 100 of them were suitable to test. • Less than of 50% were successfully validated.
In summary • There a lot of data within the repositories. (PRIDE). • There a lot of missing information. • It is not possible to check the data automatically. • PRIDEViewer could help us saving a lot of time.
Protein Set • Other times, if there is a mistake in the identification, it will not so significant if finally we can reach to the goal of the experiment. • For instance, proteins involved in a particular function or biological process.
PIKE http://proteo.cnb.csic.es/ PIKE: Protein Information and Knowledge extractor WWW
PIKE http://proteo.cnb.csic.es/ Information asked by user
First examplemedium-complexity protein list (containing 57 proteins) J Proteome Res. 2005 Nov-Dec;4(6):2435-41.
First examplemedium-complexity protein list (containing 57 proteins)
Second example Human Plasma Proteins from PRIDE (HPPP). PRIDE Accession 65
Third example The Human Plasma Proteome: A non redundant list: Mol Cell Proteomics. 2004 Apr;3(4):311-26. Epub 2004 Jan 12. >> We have merged four different views of the human plasma proteome, based on different methodologies, into a single nonredundant list of 1175 distinct gene products ….
Third example The Human Plasma Proteome: A non redundant list: Mol Cell Proteomics. 2004 Apr;3(4):311-26. Epub 2004 Jan 12.
Conclussion • PIKE represents a suitable and useful bioinformatics tool for small-or large-scale proteomics projects. • PIKE main characteristic is its ability to systematically access and automatically retrieve comprehensive biological information contained in common databases. • The resulting information is output in a wide range of standard formats that can be directly viewed, exported, or downloaded for additional analysis.