1 / 19

Patrick McConnell Duke Comprehensive Cancer Center patrick.mcconnell@duke Shannon Hastings

caGrid Version 0.5 Reference Implementation RProteomics caBIG Architecture Workspace Face to Face Georgetown University August 16 th -18 th , 2005. Patrick McConnell Duke Comprehensive Cancer Center patrick.mcconnell@duke.edu Shannon Hastings Ohio State University hastings@bmi.osu.edu.

kirby
Download Presentation

Patrick McConnell Duke Comprehensive Cancer Center patrick.mcconnell@duke Shannon Hastings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. caGrid Version 0.5 Reference ImplementationRProteomicscaBIG Architecture Workspace Face to FaceGeorgetown UniversityAugust 16th -18th, 2005 Patrick McConnell Duke Comprehensive Cancer Center patrick.mcconnell@duke.edu Shannon Hastings Ohio State University hastings@bmi.osu.edu

  2. Outline • High Level Overview of Proteomics • Data Model • Project Architecture • Process of getting to “Silver” level compliance • Functionality Exposed to Grid • Process of Grid Enablement • Demo/Screenshots • Lessons Learned / Technical Difficulties / Wish List • Acknowledgements

  3. Proteomics Overview • Goal • Find biomarker • Build predictive model • Proteins are split into peptide fragments • Mass is measured by time-of-flight (TOF) • Mass of peptides can be used to identify proteins • Peptides can undergo a second MS to help identification http://www.appliedbiosystems.com/catalog/myab/StoreCatalog/products/CategoryDetails.jsp?hierarchyID=101&category3rd=112051&trail=no

  4. Time Proteomics Data • A modest study can be on the order of 10 GB of data

  5. Project Overview • RProteomics is a development project in the Proteomics SIG of the ICR Workspace • Developing analytical routines for proteomics data • Denoising, background removal, peak identification, spectral alignment, normalization, peptide quantitation • Focus is on analytics • NOT databases, LIMS, protein identification • RProteomics is a critical step in the proteomics pipeline • LIMS -> repository -> RProteomics -> classification -> protein identification • RProteomics provides integration • Q5 classification has been integrated

  6. Statistics: Background Removal

  7. Statistics: Denoising

  8. Statistics: Spectral Alignment

  9. Statistics: Protein Quantitation

  10. Data Model • mzXML • Encodes raw spectra data (mz-intensity pairs) • Some metadata about instrumentation • Utilizes base64 encoding for binary data • scanFeatures • Encodes analysis results as a set of features • Some metadata about the experiment • Utilizes base64 encoding for binary data • Service parameters • JpegImage (GWSDL=scanFeatures.xsd, ) • Lsid (GWSDL=scanFeatures.xsd, ) • WindowSize (GWSDL=scanFeatures.xsd, ) • ThreshholdMultiplier (GWSDL=scanFeatures.xsd, )

  11. Project Architecture

  12. Project Architecture

  13. Process of getting to “Silver” level compliance • Programming and messaging interfaces • Apache Axis for web services • Wrapped functionality with Java interfaces that “made sense” • Vocabularies, terminologies, and ontologies • Data elements • Wrote tool for XML Schema to XMI conversion • Manually curated UML • Went through semantic connecting process • Information models • XML Schema to begin with, so information models were easy

  14. Functionality Exposed to the Grid • Analytical service: no security requirements • Discuss its input and output and what it does scientifically • Functionality to be exposed: • 20+ more statistical methods • Data access methods, translation methods (planned, not yet in scope)

  15. Process of Grid Enablement • Process • Creation/extraction of data types using XML Schema • Upload data types into caGrid GME • Use the Analytical Toolkit Portal to create and modify grid service interface. • Implement the server stub that is generated by making the appropriate calls into the original non-grid-enabled RProteomics application. • Compile, and deploy.

  16. Demo and/or Screenshots • Demonstration of RProteomics GUI with grid functionality

  17. Lessons Learned / Technical Difficulties / Wish List • Think grid from the beginning • Have an idea what the service interface will be ahead of time • Wrap parameters with objects • Technology is complex • XML, Schema, CDEs, Globus, Web Services, etc. • Installation is complex • Have to have working knowledge of Tomcat, Axis, Ant, environment variables, etc. • Need to have compatible versions of each component, esp. Java 1.4.2_04 • Wish list • Wizard for grid-enabling existing code • Documentation of every aspect of installation and functionality • Clone Shannon for each development project

  18. Lessons Learned / Technical Difficulties / Wish List • Starting with a non-grid-enabled application which has been tested and is stable made wrapping it to a grid service easier to debug. • Need a standard mechanism for dealing with large data objects. • Some sort of lazy loaded object/pointer would be sufficient. • Integration of toolkit portal into some standard IDE’s might make development even easier.

  19. Duke, ICR Developer Patrick McConnell, Project lead Richard Haney, Architect and developer of statistical systems Salvatore Mungal, Middle-tier Java developer Mark Peedin, Database developer Northwestern University, Collaborator Simon Lin, Proteomics domain expert Oregon Health Sciences University, ICR Adopter Shannon McWeeney Veena Rajaraman University of Pennsylvania, ICR Adopter David Fenstermacher Craig Street University of North Carolina, Collaborator Cristoph Borchers, Proteomics scientist OSU, caGRID Team Shannon Hastings Scott Oster Stephen Langella Tahsin Kurc Joel Saltz Architecture Arumani Manisundaram Avinash Krishnakant VCDE Brian Davis, Workspace Lead George Komatsoulis, VCDE lead Claire Wolfe, VCDE curator Salvatore Mungal, VCDE mentor Acknowledgements

More Related