1 / 78

The CCPN Project

The CCPN Project. Tim Stevens and Wayne Boucher October 2005. CCPN at G ö teborg: Day 1. Introduction to CCPN The CcpNmr applications Analysis basics Future developments Analysis advanced. CCPN at G ö teborg: Day 2. An overview of the data model API Tutorial Analysis Macros

sheppardg
Download Presentation

The CCPN Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The CCPN Project Tim Stevens and Wayne Boucher October 2005

  2. CCPN at Göteborg: Day 1 • Introduction to CCPN • The CcpNmr applications • Analysis basics • Future developments • Analysis advanced

  3. CCPN at Göteborg: Day 2 • An overview of the data model • API Tutorial • Analysis Macros • Widgets and Popups

  4. CCPN Overview

  5. The CCPN Project • Collaborative Computing Project for NMR • Started in 1999 • Collaborators in several countries • Developers at University of Cambridge and EBI • Unifying platform for NMR software • Similar to CCP4 (X-ray) • Main goals: • Data standards and data exchange • Software development and distribution • Meetings to determine and disseminate best practice • Open source access

  6. People • Cambridge • Ernest Laue • Rasmus Fogh • Dan O’Donovan • EBI, Hinxton • Kim Henrick • John Ionides • Wim Vranken • Anne Pajon

  7. History • Workshops: • EBI (2000, 2001) • Washington (2000) • Funding: • BBSRC (2000-2003, 2003-2006) • NMRQUAL (2001-2004) • TEMBLOR (2002-2005) • NMR-EXTEND (2005-2008)

  8. NMR Software • Problem - Heterogeneous development • Lots of proprietary data formats • Lots of stand-alone programs • Data is ‘lost’ along the way • Dedicated converters needed • Not acceptable for structural genomics projects • Solution - Unity • Data standards • Ease of transfer between programs • Completeness, integrity, deposition, data mining • Libraries

  9. Data Format vs. Data Model • Data format - How data is stored • STAR • XML • SQL • Tab-separated ascii • Data model - What data means • RCSB (PDB) mmCIF • XML DTD or schemas • SQL schema

  10. CCPN Approach • Data model rather than data format • Format independent • Language independent • Scientifically descriptive (NMR) • Library (API): in memory manipulation • Create, update, delete & query objects • One for each language • Error checking • I/O modules: load/store data from/to disk • One for each (storage format, language) • Bookkeeping

  11. Application View User GUI Application1 Application2 Application3 API In Memory Representation (Python, Java, C++, Perl) I/O Data Store (XML, SQL)

  12. Model-Driven Architecture • UML: Unified Modelling Language • Abstract representation of semantics • Pictorial • Mapping from UML: to anything • Multi-language • Multi-format • Architecture neutral (e.g. distributed or not) • Power: good and bad • CCPN uses Object Domain as its UML tool • Python as scripting language

  13. Handcoded (1%) C Autogeneration Documentation UML Model Package 1 APIs User Package 2 Python Package 3 Application Storage Java SQL Deposition XML Perl Program Domain MEMOPS Developers Experts framework

  14. Data Model Packages Reference CcpNmr Programs Citations Experimental Laboratory NMR Protocols Samples Nuclei and Molecule Structure Isotopes Molecule Targets Sequence Structure and Compound Compound Coordinates Source Preparation Molecular System Residue Project Organisms, Template Tracking Taxonomy X - ray Crystallisation Crystallography

  15. UML Example

  16. CCPN API • Classes for developers • Mainly getters and setters • More than just code stubs • Constraints (e.g. cardinality) enforced • Links the hard part • Mostly (> 99%) auto generated from UML • Some helper functions and constraints hand coded • Currently around 360k lines in Python and 650k lines in Java

  17. Developer Benefits • Specified data model and API • No I/O code • Concentrate on science, not bookkeeping • Extendible • Application data can be assigned to any object • UML model can be extended (packages) • Notification system • Register interest when specified attribute changes (class, not object, level) • Undo/Redo (in future)

  18. Current Status of API • Stable and released: • Python and XML code generation • NMR, molecule description and structure data model • In testing stages: • Java and SQL database code generation • Protein production data model • Preliminary: • X-ray crystallography data model

  19. CcpNmr Applications

  20. Data processing Spectrum analysis Structure calculation Databases Structural Biology Pipeline NMR machine

  21. NMR Applications CcpNmr Processing CcpNmr Analysis Validation software ARIA 2.0 CCPN Data Model Reference data NMRStar 3.0 CcpNmr FormatConverter Other formats (NmrView, XEasy, …)

  22. Main CcpNmr Applications • Format Converter • Conversion to and from legacy formats • Analysis • Graphical analysis (e.g. assignment) program • Processing(coming soon) • Azara “process” wrapped in data model

  23. CcpNmr Format Converter • Import/export of data formats to the Data Model • For harvesting/deposition purposes • Allow people to use or try out the data model • Interaction with existing programs • Fully or partially handles: • Ansig, Auremol, Autoassign, Azara, Bruker, Charmm, CNS/XPLOR/ARIA, Concoord, Diana/Dyana/Cyana, Discover, Fasta, Felix, Module, .mol, Molmol, Monte, NmrDraw, NMRPipe, NMR-STAR (v2.1.1, v3.0), NmrView, Pdb, Pipp, Pistachio, Pronto, Sparky, Talos, Varian, XEasy • Sequences, chemical compounds, coordinates, NMR measurements, constraints and peak lists, processing and acquisition parameters.

  24. Format Converter - The NMR Translator Peaks Chemical shifts Acquisition parameters XEasy NmrView ... XEasy NmrView ... Bruker Varian Format specific readers Generic peak converter Generic chemical shift converter Generic acquisition parameters converter Data model entry CCPN Data Model Format specific writers XEasy NmrView ... XEasy NmrView ... Azara NMRPipe Peaks Chemical shifts Processing parameters

  25. Format Converter Design • Wim Vranken (EBI) • Set of Python scripts • Accessed via: • Tkinter (Tcl/Tk) • custom Python scripts • http://www.ebi.ac.uk/msd-srv/docs/NMR/NMRtoolkit/main.html

  26. CcpNmr Analysis • Requirements • Cross platform • Scalable • Extensible • Open and easy scripting language • Modern graphical user interface • Uses CCPN data model and API • Software • Python, Tcl/Tk, C, OpenGL • (Java, X, Motif) • OS • Linux, Sun, SGI, OSX (Windows)

  27. Spectrum Windows • N-dim. windows • Multiple spectra • Automatic mapping • Contours on fly • Aliasing • Strips & cells • Mouse and key • Blocked data • Azara • Felix • NMRPipe • UCSF

  28. Graphical Interface • Menus and popup dialogues • CcpNmr widgets • Main objects • Spectra • Windows • Peaks • Resonances • Molecules • Structures

  29. Assignment • Peak finding and fitting • Rich assignment model • Mainly mouse-driven • Can assign to atoms • Ambiguous contributions • Existing structure • Short resonance list • Multiple peaks easily • Navigation

  30. The CLOUDS Protocol Spectra Pick Peaks, Link Shifts & Combine Pick Peaks & Normalise • Automated assignment & structure determination • Miguel Llinas, Alex Grishaev, et al. • Spatial distribution of anonymous resonances generated with NOEs • Integrated within CCPN • An Analysis module • Data Model glues modules • Functional platform • Distribution network Spin Systems NOE intensities Relaxation Matrix Optimisation Distance Constraints Hydrogen Atom Molecular Dynamics Proton Clouds Chain Fitting & Molecular Replacement Chain Assignment Full Structure Calculation Protein Structure

  31. The CLOUDS Protocol A fitted protein backbone A family of Clouds

  32. Other Features • Works with FormatConverter • Chemical compounds database • NMR reference information • Hard copy • PostScript • PDF • Table export • Rate analysis • Macros • Structures

  33. CcpNmr Analysis Tutorial Part I

  34. CCPN Future

  35. Extend-NMR • EU STREP application funded to fully integrate software from: • Bruker (TOPSPIN, acquisition) • Billeter, Orekhov (Garant, Munin, MDD) • Kalbitzer (Auremol) • Llinas (CLOUDS) • Nilges (Inferential Structure Determination) • Bonvin (Haddock, RECOORD) • Vriend, Vuister (Queen, What-Check) • Henrick, Vranken (NMR database) • Focus on complexes and development of better software methodology

  36. LIMS Collaborations • PIMS project collaboration • Protein production LIMS (with EBI, Sport Consortia, OPPF and Poupon) • EU STREP application (SFGLIMS) to work with : • Poupon (Protein Production) • Perrakis (Biophysical methods, crystallisation) • Bricogne (X-ray data collection and structure generation) • Prilusky, Sussman (Bioinformatics, data mining)

  37. Data Model Extensions • EXTEND-NMR • New NMR applications • Solid state NMR • PIMS • LIMS for protein production • SFGLIMS • LIMS for NMR and X-ray structure determination • X-ray • Chemoinformatics • (Metabolomics?)

  38. Code Generation Plans • C++/C/FORTRAN code • Needed for Extend-NMR and for CcpNmr Processing • Needed for interface to CYANA, NMRPIPE, AUTOPSY, etc. • Java/Database code • Extend for LIMS, high-throughput projects, NMRVIEW • Basic Machinery • Upgrades for long term extensibility/maintainability and performance

  39. API Languages and Formats Language Python Java C++ Perl XML Format SQL Forall languages: • Metamodel • Documentation Forall formats: • Schemas • I/O mappings

  40. New Core API technology • Reduce burden of adding new languages, formats • Languages (Python, Java, C++, Perl) • Storage formats (XML, SQL) Most of the logic Language & Format independent Language dependent only Format dependent only Language & Format dependent Code required for new format Code required for new language

  41. Core API technology, cont. • Remodelling of implementation details • Storages, collection types, root objects, etc. • Complex data types • e.g. rotation matrix • Client/Server architecture • For PIMS and SFGLIMS

  42. Analysis Development • Beyond CLOUDS • Large proteins, homologues • Processing linked in • Couplings (RDCs, TROSY), dihedral constraints • Titrations (Ka, Kd) • Chain states (alternate conformations) • Solid State NMR • Organic chemistry NMR (1D) • Publication-ready diagrams and tables • Windows version

  43. Developments in Extend-NMR • Integrated Bayesian, maximum entropy, … methods for data-processing, analysis and structure calculation • ‘Molecular replacement’ for NMR • Further RECOORD development • Databank for Experimental NMR spectra (DEN) • MSD database analysis

  44. Licenses • GPL • Data model • Scripts which produce APIs • LGPL • Generic libraries • Widget libraries • Format Converter • CCPN • Analysis

  45. Resources, 1 • SourceForge: • CVS repository for code • API and FormatConverter releases • http://sourceforge.net/projects/ccpn • CCPN: • Meetings, workshops • API, FormatConverter and Analysis releases • http://www.ccpn.ac.uk

  46. Resources, 2 • EBI: • Format Converter • Databases (MSD group) • http://www.ebi.ac.uk/msd-srv/docs/NMR/NMRtoolkit/main.html • JISCMAIL: • Email list • http://www.jiscmail.ac.uk/lists/ccpnmr.html • (http://www.jiscmail.ac.uk/lists/nmrgen.html)

  47. CcpNmr Analysis Tutorial Part II

  48. CCPN at Göteborg: Day 2 • An overview of the data model • API Tutorial • Analysis Macros • Widgets and Popups

  49. Major Data Model Packages

  50. CCPN Packages • Groupings of related data • e.g. NMR, X-ray, Molecular description • Connections between packages • e.g. NMR loads Nucleus (isotope) information • Allows lazy loading • Only load relevant data • Only load when a link is queried • Save only modified • Reference packages • Chemical compound, Reference chemical shifts Molecule ChemComp People MolSystem Nucleus Sample Coordinates Nmr

More Related