360 likes | 595 Views
PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?. Information Management System. Information Management System (IMS) is a joint database and information management system
E N D
General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?
Information Management System • Information Management System (IMS) is a joint database and information management system • A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data • Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge
Scientific goals • Recording laboratory information • A lot of data keeping • 10,000s of experiments • 1,000,000s of samples • Data interchange and interoperation • Collaboration in protein production • Share data between stages and sites • Data transfer to beamline or NMR ops • Data mining and reporting • Analysis • Negative results can be mined to improve methods • Scientific publications • Data deposition
PIMS • Protein Information Management System • Started in January 2005 • 5 years UK project, funded by the Biotechnology and Biological Sciences Research Council (BBSRC) • Based on the Protein Production Data Model paper • Proteins. 2005 Feb 1;58(2):278-84. “Design of a data model for developing laboratory information management and analysis systems for protein production.”
Scope of PIMS Target selection Bioinformatics import Target optimisation Cloning Expression Purification & Concentration Crystallisation Microcrystals export Molecular Biology Data collection Phasing Model building Crystallography Refinement
BBSRC SPoRT funding Scottish Structural Proteomics Facility (SSPF) Universities of Dundee, St. Andrews, Glasgow and Warwick. Membrane Protein Structure Initiative (MPSI) Universities of Glasgow, Leeds, Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury. Protein Information Management System (PIMS) CCP4, Diamond Oxford Protein Production Facility IBBMC, University Paris Sud European Bioinformatics Institute York Structural Biology Laboratory Daresbury Laboratory Other UK protein scientists Other protein scientists worldwide BBSRC funding PIMS SSPF MPSI Stakeholders
Collaborations • Seamless data transfer and a consistent UI ... • ... from target to structure deposition • ... so far as possible • Bioinformatics: SSPF pipeline, EBI workflow • Crystallization: NKI, EMBL Hamburg & Grenoble (BIOXHIT) • Data transfer: e-HTPX • Data collection: DNA, X-track • Structure solution: CCP4, CCPN • Instruments: Kendro, Csols
General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?
Design • The data model • focuses on what data should be stored • is used to design the entities (classes or tables) that we are dealing with, their various attributes, and their relationships • The goal of the data model is to make sure that the all data objects required are completely and accurately represented
Reliability • Loss of data is inexcusable • Must be able to correct wrong data • Must keep audit trails • Must allow future changes • All made feasible by • Data model • Database • Software engineering standards
HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories. Acta Crystallogr D Biol Crystallogr. 2005 Jun;61(Pt 6):671-8. Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A. OPPF based on Nautilus MOLE: a data management application based on a protein production data model. Proteins. 2005 Feb 1;58(2):285-9. Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW. Ancestry
PIMS • The aim is to provide a Laboratory Information Management System (LIMS) • for Laboratories that produce proteins from target genes • can be incorporated into commercial software in the area of biotech and protein production • Improve the quality of the experimental data deposited into PDB • by providing a software for lab scientists to harvest their daily experimental data from protein production to structure • My roles • Data Model • Database / Persistence layer / Java API • Java Applet development
General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?
Why is Data Modelling Important? • A Data Model is a plan for building a database • detailed enough to be used to create the physical structure • simple enough to communicate to the end user the data structure • The Unified Modelling Language (UML)
Data Model • Related to protein production & crystallisation • Suitable for large & small facilities • Required to reproduce the samples & experiments involved • Used for tracking samples, experiments & results • Developed to help software developers to collect, store and exchange information through the provision of a common platform
Protein production work is generally the investigation of a particular protein, the Target The work often aims to produce a derivative of the Target, such as a single domain or complexes Area covered target protein production crystallisation NMR tube X-Ray NMR phasing structure
Change Control Board • The data model is a work in progress • The science is developing too • Local protocols, which are novel and confidential • Not easy work • Thanks to… • Geoff Barton (Dundee) • Steve Prince (Manchester) • Anne Poupon (IBBMC) • Jon Diprose (OPPF) • Alun Ashton (Diamond) • Rasmus Fogh (CCPN)
Implemented in UML (Object Domain) Developed within a framework provided by the CCPN project Information stored in the UML Data Model is used to generate automatically SQL schema, Java Application Program Interfaces (APIs) and Documentation UML Data Model Generation machinery framework XML schema Python API Doc SQL schema Java API www.ccpn.ac.uk
DB SQL schema Architecture • The API provides methods to access the underlying DB to store and retrieve data • This allows applications to manipulate data without a detailed knowledge of the way in which the data is stored • Various different applications make use of the API • LIMS • Any High Throughput applications (non-GUI) • They are able to exchange data easily storage API Tools: GUI, standalone applications,… Java API Persistence layer
From data model to application • Data Model • Use cases • Scientific logic into requirements • Specifications • security, performance, usability, etc • Java API • Test data • UI Design • Application
Modular Construction • http://www.pims-lims.org/project/use-case-suite.html Training & Support Workflow Reporting Visualisation Data Mining Scheduling Data Capture Mobile Data Collection Instrument Management Inventory Management Sample Management Bioinformatics System Administration Setup & Configuration Access Rights Management Project Management Reference Data
Reference data • Supplier details • Protocols • documenting set of editable default protocols • user interface design with Ed Daniel • Reagents • protocol-related reference samples • chemical hazard information • e.g. R and S-phrases • documenting lab chemicals as ‘MolComponents’ • includes synonyms, formula, CAS-number and mass • naming system under discussion with NKI • ~400 identified, ~180 based on crystallisation screens
Analytical Data: A Tower of Babel Integration CSols produces a widely used Instrument Integration Package if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver Kendro/Thermo LC MS IR NMR Instrument management
General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?
What can PIMS do for you? Not a lot right now Whatever you want, eventually ... ... as long as it's data management for protein production
Version 0.2 • October 2005 • Then incremental delivery • … for one customer at a time and integrate with trunk • … and repeat until project complete
Applet Protocol Editor • Choose a step from a list • Draw Temperature step • List of the protocol's steps already done and reload them from the bottom of the screen • Record the protocol in DB • Display the protocol's list from DB in the explorer and reload anyone of them
Applet Workflow • Select in tabulation the experiment categories • Drag and drop the selected experiments • Build a workflow or load an existing one • Associate a protocol to an experiment
A collaborative framework • … to develop a family of LIMSes • Developers have difficulty in justifying the time required to create the software needed • The biologist doesn't want to wait • The result is a rapidly written LIMS that is fragile and cannot scale if the project grows up • Need a generic LIMS • helps to solve these problems by giving developers a tool that can scale to meet the needs of a large project • And which welcome plugins for novel methods
Conclusion • Each “Click” could be a lot of coding ... • What do molecular biologists really want? • Expectations are High! • Users make an indispensable contribution • Tell us when it's not good enough ... • ... we will respond
PIMS developer group Chris Morris (CCP4) Anne Pajon (EBI) Ed Daniel (Daresbury) Peter Troshin (MPSI) Jo van Niekerk (SSPF) Susy Griffiths (YSBL) Jon Diprose (OPPF) Katherine Pilicheva (OPPF) Anne Poupon (IBBMC) Eric Oeuillet (IBBMC) Sabrina Haquin (IBBMC) Alun Ashton (Diamond) EBI-MSD Kim Henrick Wim Vranken John Ionides CCPN Wayne Boucher Rasmus Fogh Tim Stevens Dan Acknowledgements