1 / 49

Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data

Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data. ProteoRed Bioinformatics Workshop Salamanca Alberto Medina-Aunon March, 15th 2010. Main Topics. Mass spectrometry and protein and peptide validation PRIDEViewer: Description.

fadey
Download Presentation

Data Validation and Annotation: PRIDEViewer and PIKE Bioinformatics analysis from proteomics data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Validation and Annotation: PRIDEViewer and PIKEBioinformatics analysis from proteomics data ProteoRed Bioinformatics Workshop Salamanca Alberto Medina-Aunon March, 15th 2010

  2. Main Topics • Mass spectrometry and protein and peptide validation • PRIDEViewer: Description. • Examples: Uses-cases. • Experiment context: Linking functional information to our proteins. • PIKE: Description. • Examples: Uses-cases.

  3. MS Validation. The easiest Way • Starting from: • Mass spectrum/spectra • Tentative identification/Sequence • Search Engine • Candidate: AFLLAMAARTGFRTR

  4. How to do it • By hand: • Just for a few sequences/spectra • We cannot read every format files (for instance binaries). • Semi-automatically: • Using PRIDE files as input: PRIDEViewer

  5. PRIDEViewer Experiment info

  6. PRIDEViewer Sample and Instrument info

  7. PRIDEViewerSpectra and identifications

  8. PRIDEViewerGel Separation

  9. PRIDEViewerMascot interface

  10. One Example: Identification using 5 peptides

  11. Example Mascot output

  12. Another example:350 input spectra

  13. Validation study • Starting from one public proteomics repository – EBI PRIDE-: • Retrieve a set of available experiments. • Check the level of fulfillment of the experiments. • Repeat the protein and peptide identification. VALIDATE THE EXPERIMENT……..

  14. Validation using PRIDEhttp://www.ebi.ac.uk/pride/

  15. PRIDE: Searching experiments: Biomart

  16. Validation. First Round. Biomart

  17. Validation- First Round: PRIDE Accession 1642

  18. First View: Mascot Results

  19. Validation – First Round:PRIDE Accession 1642 Why? If we explore the data, we’ll find ….. Delta mass around 32Da

  20. Validation – First Round:Pride Accession 1642 • Hypothesis…. • First and third sequences present a mass variation around 32 Da. • Is there a modification in C or N termini? In that way, second sequence will present as well. • Is any residue -or more than one- modified? • We’ll extract the common aminoacids: D, A, S, I, C, M and G • Compare they with the described modifications with a mass variation of 32 Da.

  21. Validation – First Round:PRIDE Accession 1642. Only this modification could explain a common property between both sequences. So, we’ll select it in the next round

  22. Validation – First Round:PRIDE Accession 1642

  23. Validation – Second Round: Latest Experiments. Retrieved by hand

  24. Validation – Second Round:Latest experiments • PRIDE accession id: 10470 to 11257 (787 experiments). • No one is suitable to check. • No information regarding the identification is available. • PRIDE accession id: 10000 to 10074 (74 experiments). • One dataset could be checked: 10042 to 10060. (Dataset title: Low abundance proteome of human red blood cells captured by combinatorial peptide libraries)

  25. Pride Accession 10053

  26. Mascot output

  27. Pride Accession 10060

  28. Mascot output: No identification

  29. Validation – Third Round: Recent Experiments. Retrieved by hand • Experiment id: 9900 to 9999 • Two dataset are suitable to check: • 9900 to 9942: LC-MALDI experiments (Tannerella forsythia). • 9944 to 9949: Rattus norvegicus. • 9984: Zebrafish. No spectra. • 9985 to 9992: Homo sapiens. (No identifications). • 44 not available.

  30. Validation – Third Round:Experiment 9900

  31. Validation – Third Round.Experiment 9900

  32. Validation – Third Round: Experiment 9900. Summary

  33. Study summary • Around 1000 PRIDE experiments were downloaded from PRIDE central repository. • Around 100 of them were suitable to test. • Less than of 50% were successfully validated.

  34. In summary • There a lot of data within the repositories. (PRIDE). • There a lot of missing information. • It is not possible to check the data automatically. • PRIDEViewer could help us saving a lot of time.

  35. Protein Set • Other times, if there is a mistake in the identification, it will not so significant if finally we can reach to the goal of the experiment. • For instance, proteins involved in a particular function or biological process.

  36. PIKE http://proteo.cnb.csic.es/ PIKE: Protein Information and Knowledge extractor WWW

  37. PIKE http://proteo.cnb.csic.es/

  38. PIKE http://proteo.cnb.csic.es/

  39. PIKE http://proteo.cnb.csic.es/ Information asked by user

  40. PIKE http://proteo.cnb.csic.es/

  41. PIKE output. CSV

  42. PIKE output

  43. First examplemedium-complexity protein list (containing 57 proteins) J Proteome Res. 2005 Nov-Dec;4(6):2435-41.

  44. First examplemedium-complexity protein list (containing 57 proteins)

  45. Second example Human Plasma Proteins from PRIDE (HPPP). PRIDE Accession 65

  46. Third example The Human Plasma Proteome: A non redundant list: Mol Cell Proteomics. 2004 Apr;3(4):311-26. Epub 2004 Jan 12. >> We have merged four different views of the human plasma proteome, based on different methodologies, into a single nonredundant list of 1175 distinct gene products ….

  47. Third example The Human Plasma Proteome: A non redundant list: Mol Cell Proteomics. 2004 Apr;3(4):311-26. Epub 2004 Jan 12.

  48. Conclussion • PIKE represents a suitable and useful bioinformatics tool for small-or large-scale proteomics projects. • PIKE main characteristic is its ability to systematically access and automatically retrieve comprehensive biological information contained in common databases. • The resulting information is output in a wide range of standard formats that can be directly viewed, exported, or downloaded for additional analysis.

  49. Questions?

More Related