180 likes | 188 Views
This presentation discusses the use of provenance services for traceability in biomedical analysis. It covers the requirements from users, the bigger picture of user expectations, and the implementation of a provenance service called CRISTAL. The presentation highlights the importance of a robust provenance system in ensuring confidence in research results.
E N D
Research Traceability using Provenance Services for Biomedical Analysis Dr Peter Bloodsworth CCCS Research Centre UWE, Bristol, UK peter.bloodsworth@cern.ch HealthGrid Presentation: 29th of June 2010
Talk Structure • The neuGRID Project. • Requirements from Users. • The Bigger Picture. • A Provenance Service. • CRISTAL. • Conclusion. HealthGrid Presentation: 29th of June 2010
The neuGRID Consortium Provincia Lombardo Veneta Fatebenefratelli, ITALY Neuralyse Europe (Prodema Medical), SWITZERLAND University of the West of England, Bristol, UK Maat Gknowledge, SPAIN Vrije Universiteit Medical Centre, THE NETHERLANDS Karolinska institutet, SWEDEN HealthGrid, FRANCE CF consulting s.r.l., ITALY HealthGrid Presentation: 29th of June 2010
Project Objectives • To build a new user-friendly Grid-based research e-Infrastructure. • Collection/archiving of large amounts of imaging data. • Paired with computationally intensive data analyses. • To enable EU neuroscientists to carry out cutting-edge research. • Imaging of degenerative brain diseases. HealthGrid Presentation: 29th of June 2010
neuGRID Provenance Requirements 1 2 3 Provenance in neuGRID relates to: Data provenance (source, quality control applied and other facets.) Workflow provenance (author, versioning, certification, etc.) Analysis Result provenance (data set, workflow chosen, settings, errors, etc.) HealthGrid Presentation: 29th of June 2010
The Bigger Picture • Real-world end users care about doing their research and getting their results. • They don’t care about the grid / certificates or virtual organisations. • They don’t want to learn grid-speak. • They don’t all want to do the same things in the same way. • They expect services that help them to do their work. • They expect a high-level of integration between services and reliability. HealthGrid Presentation: 29th of June 2010
The neuGRID Provenance Service HealthGrid Presentation: 29th of June 2010
The Provenance Architecture • Provenance API • Translator • CRISTAL Core • Provenance DB HealthGrid Presentation: 29th of June 2010
Service Wrapper • Provides a web service-based interface to the Provenance Service • Consists of methods for • Creating workflows • Creating workflow instances • Storing workflow provenance • Retrieving workflow provenance HealthGrid Presentation: 29th of June 2010
Translator • To prevent lock-in to a specific workflow format, the Provenance Service consists of an adaptor-based translator for converting user workflows into CRISTAL workflow format • Acts as bridge between users and CRISTAL core CRISTAL Core • Provenance management is handled internally by CRISTAL. • Workflow needs to be translated between user format and CRISTAL format. HealthGrid Presentation: 29th of June 2010
CRISTAL was designed to track the development of LHC detector components at CERN HealthGrid Presentation: 29th of June 2010
CRISTAL in neuGRID Overview CRISCRISTALTAL Workflow steps Analysis data Histories CRISTAL Process & Data Tracking Analysis Suite Researcher Provenance Data Input Data LORIS Derived Data A Complete Analysis Knowledge Base
CRISTAL Main Functions • Complete capture of system functionality in workflows. • As every action is represented by a workflow activity, every operation is recorded and stored in a replayable way. • Every piece of data, including descriptions, is versioned, so all previous states of items are available. • Several interfaces exist to bridge to other components for database storage, job distribution, definition management, etc.
Further Developments • Composite jobs. If some tasks are clustered together, they should be executed by CRISTAL as a composite activity. • In composite jobs, each sub-job should send the feedback to CRISTAL as soon as it completes its execution. • The Glueing Service should have user related information to map users to jobs and provenance data. • The Querying Service should query both CRISTAL provenance and LORIS data • The translation component in the pipeline service should map the user workflows to CRISTAL workflows. The translation should be two way. HealthGrid Presentation: 29th of June 2010
Conclusions • A robust provenance system is necessary if users are to have confidence in and use the neuGRID infrastructure for their research. • Provenance is important throughout neuGRID, from data input through to analysis output. • Errors that occur at any stage may effect the final results. • It can be thought of as a chain of evidence and spans: • Data provenance (source, quality control applied and other facets.) • Workflow provenance (source, versioning, certification, etc.) • Analysis Result provenance (data set, workflow chosen, settings, errors, etc.) • We need CRISTAL which is a resource that is both powerful and flexible in the way that it captures provenance data. HealthGrid Presentation: 29th of June 2010
Question Time None like this please!! HealthGrid Presentation: 29th of June 2010