140 likes | 305 Views
XML Standards for Proteomics Data. Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and Life Sciences, University of Glasgow. Proteomics. 2D-PAGE. 1. 1. 2D-PAGE to separate proteins. 2. 3. Image Analysis.
E N D
XML Standards for Proteomics Data • Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt • Department of Computing Science and • the Institute of Biomedical and Life Sciences, University of Glasgow
Proteomics 2D-PAGE 1. 1. 2D-PAGE to separate proteins 2. 3. Image Analysis Mass Spectrometry 2. Image analysis to determine the volume of protein spots 3. Mass spectrometry (MS) to characterise protein spots Database Search 4. Database searches to identify proteins 4.
Proteomics Data Issues Instruments • Many different instruments for data collection • Great variety of software used for analysis • Access to external databases • For protein identification • Protein characterisation after ID • High-throughput techniques generate very large data sets Scanner, MS Software Image analysis, MS viewer Databases Genome, microarray, publications, more...
A Standard Model for Proteomics • Improve management of laboratory workflows • Data Integration: link local data to external data sources • Development of public databases, enabling: • Queries over protocols, raw data and analysis • Experiments to be reproduced or re-analysed by other research groups • Co-analysis of proteome data with genome, transcriptome and other resources
Biological Collaborators • Parasitology research group • Investigating host-parasite response with Toxoplasma gondii • Ras/Raf pathway research at the Beatson institute • Functional Genomics facility at the IBLS Functional Genomics Facility - http://www.gla.ac.uk/departments/ibls/ASU/fgf/
MAGE model for Proteomics • The MAGE model has been developed to store microarray protocols, data and analysis • A similar model will facilitate integration between microarray and proteome data • Aspects of the model require few modifications to be applicable to proteomics • We are developing a new representation of 2D gel analysis and MS data
Experimental Protocols in MAGE Protocol • MAGE model is extensible • Protocol is generated as an ordered list: events, materials and hardware • Few changes required to focus on protein extraction rather than mRNA production ArrayDesign BioEvent BioMaterial BioAssay Array
Experimental Protocols for 2D gels Protocol • MAGE model is extensible • Protocol is generated as an ordered list: events, materials and hardware • Few changes required to focus on protein extraction rather than mRNA production 2D_PAGE_Setup BioEvent BioMaterial BioAssay 2D_PAGE
Proteomics Data Model • Image analysis identifies spots observable on the gel • Important to store raw data and analysis from MS • Separate package for cross gel analysis e.g. time series MS_Setup MS_Data BioSequence Protein_Spots Data_Analysis 2D_PAGE Multiple_ Analysis Link From Protocol
Proteomics Model Protocol Protocol BioEvent 2D_PAGE_Setup • Experimental protocol packages require few changes from MAGE • New data model includes MS data and statistical analysis between gels • Model incorporates storage of external database searches BioMaterial BioAssay Data 2D_PAGE Experiment Protein_ Spots Multiple_ Analysis Data_ Analysis MS_Setup MS_Data BioSequence Annotation Audit& Security Common BQS Description Measurement
Proteomics Database and Indexing Technology • A prototype database for proteomics has been developed • We have developed a specialised index structure for XML, in order to improve query performance • The performance of the index has currently been tested with 800MB of protein data1 Data Stores XML Index 6 2 Data Path Tree 7 1 3 8 4 XML Dictionary 1 Experiment 2 gelImage 3 spots 4 spot … 9 1. Protein Information Resource - http://pir.georgetown.edu/
Related Research Databases: • SWISS-2DPAGE, LIMS systems Standards: • Proteomics Standards Initiative (PSI) • Standards for protein-protein interactions and mass spectrometry • PEDRo system with PEML: Proteomics experiment markup language • PSI: http://psidev.sourceforge.net/
Work In Progress • Work towards an XML standard for proteomics • Create standards for capturing statistical processing of large data sets • Developing XML indexing technology to improve data integration and query power • Developing a proteome database utilising XML indexing and a standard model
Contact jonesa@dcs.gla.ac.uk Bioinformatics Research Centre - www.brc.dcs.gla.ac.uk Acknowledgements Researchers in Jonathan Wastling lab for input into the model. Dr Ashwin Kotiwaliwale at the Beatson for the collaboration on the prototype database. The Functional Genomics Facility is supported by a Wellcome Trust grant for £2.4M. My research is supported by an MRC Bioinformatics PhD studentship, Ela Hunt is supported by an MRC Fellowship.