340 likes | 512 Views
Daehee Hwang Leroy Hood Institute for Systems Biology. Why Prequips for Systems Biology with proteomic data?. Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample analysis
E N D
Daehee Hwang Leroy Hood Institute for Systems Biology
Why Prequips for Systems Biology with proteomic data? • Need for visualization, analysis, and integration of multiple proteomic datasets: raw data level, peptide level, protein level, multi sample analysis • Need for an interface between proteomic data and systems biology analytical tools such as network/pathway analyses
Integration of proteomic data at various levels ? ? ? Communication not possible! Protein Id + Quantitation Protein Id + Quantitation Protein Id + Quantitation Trans-Proteomic Pipeline Trans-Proteomic Pipeline Trans-Proteomic Pipeline Peptide Id + Quantiation Peptide Id + Quantiation Peptide Id + Quantiation Raw Data (MS, MS/MS) Raw Data (MS, MS/MS) Raw Data (MS, MS/MS)
Pep3d: Quality Assessment Interaction Database STRING Network Analysis Cytoscape Microarray Data Analysis Mayday, TIGR Pathway Database KEGG Pep3D Gaggle ? Protein Id + Quantitation Multi Sample Trans-Proteomic Pipeline Peptide Id + Quantiation Raw Data (MS, MS/MS) Properties • quality assessment • 2D gel-like visualization Prequips
Pep3d: Quality Assessment Pep3D Pep3D Instance 1 Instance 2 Communication not possible!
Interface to Systems Biology ? Communication not possible! Protein Id + Quantitation Trans-Proteomic Pipeline Peptide Id + Quantiation Raw Data (MS, MS/MS) Interaction Database STRING Network Analysis Cytoscape Gaggle Microarray Data Analysis Mayday, TIGR Pathway Database KEGG
Prequips Overview Interaction Database STRING Network Analysis Cytoscape Microarray Data Analysis Mayday, TIGR Pathway Database KEGG Gaggle ? Protein Id + Quantitation Multi Sample Trans-Proteomic Pipeline Peptide Id + Quantiation Raw Data (MS, MS/MS) Key Properties • handles multiplesamples at all levels Prequips • integrates high-levelanalysis tools • is extensible
Integration of proteomic datasets at various levels e.g. protXML, ... Mass Spectrometer Protein Quantitation protein-level data further analysis results annotation Protein Inference raw data e.g. mzXML, mzData, ... Peptide Quantification Validation Database Search peptide-level data e.g. pepXML, AnalysisXML,... Trans-Proteomic Pipeline
Data model Project Multi-Sample Analysis Viewers Perspectives Single-Sample Analysis Protein Level Peptide Level Raw Data Data Structures Core Meta Core Meta Core Meta Data Providers protein-level data source, e.g. protXML files peptide-level data source, e.g. pepXML, dta or AnalysisXML files raw data level, e.g. mzXML or mzData files
Case Study: Toponomic change in drug treated Mø Mock1 Mock2 Thapsigargin 8% 28% 2 4 6 8 10 12 14 16 18 20 Fraction #: Calreticulin BiP ATPase Bcl2 Lamp1 114 115 116 117
Visualization: Single exp. project manager peak map for run 29 CID spectra that have been selected all scans of Mock 1 experiment detailed information about one of the level 2 spectra level 1 spectrum & corresponding CID spectra level 1 level 2 level 2
Visualization: Multiple exps. (polymer?) contamination in all 4 runs (this would be hard to see with Pep3D) green = 0 red = 1
Visualization: assess, quntify, etc. retention time min max m/z min max Mock Up (software is under development): map 1 map 2 map 3 map 4 map 5 map 6 X X map 1 map 2 X map 3 map 4 Doesn’t really match the remaining 3 maps!
Prequips & the Gaggle Gaggle Boss DAVID KEGG Browser Prequips Cytoscape Mayday Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks. R statistical environment
Cytoscape overall mouse protein/protein interaction map in Cytoscape
Analysis: Feature extraction Protein table Filters Gaggle plugin for interaction with other tools
Analysis: Feature extraction calreticulin Gaggle plugin: selection for broadcast
Analysis: Feature selection Mock1 Mock2 Thapsigargin
Prequips to Gaggle Gaggle Boss DAVID KEGG Browser Prequips Cytoscape Mayday Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks. R statistical environment
Gaggle to Cytoscape Gaggle Boss DAVID KEGG Browser Prequips Cytoscape Mayday Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks. R statistical environment
Integration: Network Analysis chaperones actin filament regulation proteasome complex Thapsigargin 114 iTRAQ ratio ribosome large subunit
Cytoscape to Prequips Gaggle Boss DAVID KEGG Browser Prequips Cytoscape Mayday Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks. R statistical environment
Analysis: Feature extraction- Module selection the ids sent from Cytoscape through the Gaggle proteasome proteins
Prequips & the Gaggle Gaggle Boss DAVID KEGG Browser Prequips Cytoscape Mayday Exchange of data structures such as name lists, lists of name-value pairs, matrices and networks. R statistical environment
Analysis: Functional enrichment the proteasome complex enriched compared to a mouse genome background
Prequips Summary Interaction Database STRING Network Analysis Cytoscape Microarray Data Analysis Mayday, TIGR Pathway Database KEGG Gaggle ? Protein Id + Quantitation Multi Sample Trans-Proteomic Pipeline Peptide Id + Quantiation Raw Data (MS, MS/MS) Key Properties • handles multiplesamples at all levels Prequips • integrates high-levelanalysis tools • is extensible
Conclusion • general and extensible software for systems biology research with proteomics mass spectrometry data. • Integration capability of data from various sources for visualization and analysis. • An interactive environment that supports (visual) data exploration.
Software details • implemented in Java • based on Eclipse Rich Client Platform • extremely modular architecture • multiple plugin interfaces • e.g. viewers, data providers, algorithms • meta information framework • analysis results, sequence information, annotation, ... • data structures as plugins • requirement to support future analytical tools and data sources
Acknowledgements • Special thanks to Nils Gehlenborg • Hood Lab: Inyoul Lee • Kay Nieselt • Aebersold Lab: Nichole King, James Eddes, Eric Deutsch, Ning Zhang, David Shteynberg, Wei Yan, and Andrew Garbutt • Paul Shannon for help with the Gaggle
Mayday Core WEKA Library Machine Learning SBEAMS installation SBEAMS Visualization R environment Bioconductor R PostgreSQL database Database Gaggle Prequips MySQL database anything else Excel