490 likes | 726 Views
Edinburgh 9 September 2008. Memops Data modelling and automatic code generation. Memops - main points. Code generation framework Data access subroutine libraries Fully automatic code generation from model Several programming languages in parallel Precise, detailed, validated data. Memops.
E N D
Edinburgh 9 September 2008 MemopsData modelling and automatic code generation
Memops - main points • Code generation framework • Data access subroutine libraries • Fully automatic code generation from model • Several programming languages in parallel • Precise, detailed, validated data
Memops • Introduction • Code generation • Generated libraries • Applications of Memops
The CCPN Project • Collaborative Computing Project for NMR • Since 1999 • Unifying platform for NMR software similar to CCP4 for X-ray crystallography • Community-based, open-source, software development • Code generation, data model, applications, meetings
NMR Structural Biology Pipeline Sample Preparation NMR Machine Data Processing Spectrum Analysis Structure Calculation Slow, complex,interactive Repository Database
Native Anarchy Task1 Task2 Convert Task1 Task1 Convert Convert Convert Task2 Convert Task3 Task3 Task3
With Data Standard Task1 Task2 Task1 Convert Convert Convert DataStandard Task1 Task2 Convert Convert Convert Task3 Task3 Task3
Data standard - objectives • Lossless data transfer between programs- different approaches and architectures • All data needed for pipeline software • Creating data, not analysing end results • Intermediate results needed • Comprehensive, detailed, complex • Completeness, integrity of changing data • Precisely defined standard • A single central description • Validation directly against standard
CCPN approach • Standard API, no stable format • easier to maintain as model changes • Abstract data model • Exact correspondence to APIs • API implementations for several languages • Transparent access to XML or DB storage • Complete validation of model rules and constraints
Memops • Introduction • Code generation • Generated libraries • Applications of Memops
Automatic Code generation • Model will change over time • Several parallel implementations • Synchronisation between APIs and model • Maintenance and debugging • Resources are limited • Automatic Code Generation • Write and debug once and for all • Any domain, from Astrophysics to Zoology • Quick and simple to extend model • E.g. Application-specific packages
Software Domain MEMOPS Developers Experts framework Handcoded (< 1%) Autogeneration Documentation UML Model Package 1 APIs User Package 2 Python Package 3 Application Storage Java Wrappers SQL C Deposition XML Code Generation Framework
ObjectDomainUML data Export Autogeneration Code Generation Legend: edit UML CCPN codeOff-the-shelf files CCPN generated API codeSchemasMappings etc. In-Memory ModelPython objects MetaModel On-disk model XML file
TextWriter ModelTraverse PyApiGen FileApiGen PyLanguage ApiGen PyType PyFileApiGen API generator • Written in Python • Modular • Different generators share code
Memops • Introduction • Code generation • Generated libraries • Applications of Memops
Model features • Packages to subdivide model, code, and data files • Objects. Unique context, compare-by-identity • Complex data types. Different contexts, compare-by-value • Simple data types, PositiveInt, enumerations, … • Attributes and links: • Cardinality, frozen/modifiable, derived • Unique/ordered collections (sets, lists, unique lists) • Ad-hocconstraints on attributes, simple and complex datatypes, and objects.
ccp.molecule.MolSystem.MolSystem StructureEnsemble 1 * +code: Word Chain +ensembleId: Int +name: Text +atomNamingSystem: Line +code: Line +keywords: Line +resNamingSystem: Line +getChain() 1 ...: * +getEnsembleValidations() 1 +coordChains 1 1 * 1 * * Residue Model ccp.molecule.MolSystem.Chain +seqId: Int +serial: Int +seqCode: Int 1 +name: Line +seqInsertCode: Line = +details: Text * +getResidue() ccp.molecule.MolSystem.Residue 1 1 1 1 * * Coord * Atom 1 +altLocationCode: Line = ccp.molecule.MolSystem.Atom +name: Word 1 +x: Float +elementSymbol: Word +y: Float * +z: Float +getAtom() +bFactor: Float = 0.0 1 +getElementSymbol() 1 +occupancy: Float = 1.0 +getChemAtom() ccp.molecule.ChemComp.ChemAtom Molstructure model package
CCPN APIs • Application Programming Interface • Object oriented • Data accessed in memory as if stored in the data model • Implementations come with: • Integrated, transparent I/O (file or database) • Complete validity checking • Protection against casual change (data encapsulation) • Versioning and backwards compatibility • Event notifier system • Slot for application-specific data
Legend: CCPN codeOff-the-shelf Application codefiles Science code User InterfaceUtility functions CCPN generated Python+XML at runtime User application Data get, set. Validity check Python API XML parser XML I/O code Generic XML read/write XML I/O mappings What to do for which element User data in CCPN XMLformat Data StorageXML files
Science code User InterfaceUtility functions Java+DB at runtime Legend: CCPN code Off-the-shelf Application code files CCPN generated HQL Presentation layer Custom queries(Hibernate QueryLanguage) Optional Java API Hibernate mappings Hibernate Hibernate Database Schema Database
Now Available • Version 2.0 just released • Python+XML, Java+XML, C+XML Java+DB (with Hibernate) • Available under GPL licensefrom Sourceforge or www.ccpn.ac.uk • CCPN Data Standard: • NMR, Macromolecules, LIMS • 46 packages • 552 classes and data types • Python+XML implementation 800,000+ lines of code
Memops • Introduction • Code generation • Generated libraries • Applications of Memops
CcpNmr Suite • Analysis • Interactive NMR analysis • FormatConverter • Convert between 30+ NMR and structure formats • Built on top of CCPN model (Python+XML) • Version 2.0 released • Widely used in macromlecular NMR
ExtendNMR NMR pipeline • Integrated macromolecular NMR pipeline- from sample to structure • Pre-existing programs from 8 groups • In-memory conversion to internal data structures • Integrated versions released: • ARIA (NMR structure generation) • Bruker TOPSPIN, Manufacturers processing/analysis package
BIOXDM • Software pipeline for on-synchrotron crystallography • Exploit new technology ( goniometers) • Experiment optimisation, acquisition, and on-line processing • Independent data model, with Memops machinery • Java+DB implementation for runtime concurrent access
EUROCarbDB • Distributed deposition database • Glycobiology and glycomics • NMR, MS, HPLC and topology • Java. Database storage using Hibernate • CCPN model Java+DB implementation slot in as-is
Funding acknowledgements • BBSRC CCPN grants • European Union grants • EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and TEMBLOR contracts • Industry support • AstraZeneca, Dupont Pharma (now BMS), Genentech, GlaxoSmithKline • Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’
People • Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing) • Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Lamazhapova • Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris Penkett • Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004-01195
Overview • Packages • The Implementation package • Objects • DataTypes and DataObjTypes • Access control
ARIA – structure generation from NMR data Custom conversion Application ARIA XML ARIA Data Model CCPN Data Model CCPN XML • ARIA imports • Peak Lists • Constraints • Sequences • Chemical shifts • ARIA exports • Peak Assignments • Filtered Constraints • Violations • Structures
API functions • ‘get’ and ‘set’ (Attributes and links) • ‘add’ and ‘remove’ (Collection attributes and links) • ‘sorted’ (Unordered collection links) • ‘findFirst’ and ‘findAll’ (Collection links) • Simple filtering (attribute == value) • create and ‘new’ (Objects) • Normal and ‘factory function’ object creation • delete (Objects) • ‘Delete’ function – cascades to objects rendered invalid by deletion • checkValid, checkAllValid (Objects) • API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.
FormatConverter - The NMR Translator Peaks Chemical shifts Acquisition parameters XEasy NmrView ... XEasy NmrView ... Bruker Varian Format specific readers Generic peak converter Generic chemical shift converter Generic acquisition parameters converter Data model entry CCPN Data Model Format specific writers XEasy NmrView ... XEasy NmrView ... Azara NMRPipe Peaks Chemical shifts Processing parameters
ExtendNMR: ARIA • Structure generation from macromolecular NMR data, ambiguous distance constraints • One of two leading programs • Python and scripts, with CNS dynamics engine • All input and output integrated to CCPN standard
ExtendNMR: Bruker TOPSPIN • NMR processing program of major NMR instrument company • Java. In-memory conversion to CCPN Java+XML implementation • CCPN output in current TOPSPIN release,Expanded in upcoming release.
Atom Bond 2 +bonds +elementName: String = C +bondOrder: Float = 1.0 +atoms * Data Model v. Data Format Abstract model (UML) : Relational Database : Atom Atom_Bond_Connect Bond XML : <Atom ID=“AT1” elementName=“C”> <Bond ID=“BD1” bondOrder=“1.0”> <BondList> <Atom1 IDREF=“AT1”/> <Bond IDREF=“BD1”/> <Atom2 IDREF=“AT2/> . </Bond> . </BondList> </Atom>
Packages • Partition model, code, and data • Import each other • Can be omitted • All import Implementation and AccessControl • Each have a TopObject • No links between data from rival Topbjects (different extents of data)
memops.Implementation.MemopsRoot +name: Word = ccpProject +override: Boolean = False +currentUserId: Word = user +newGuid() 1 1 +getPackageLocator() 1 +currentChemComp +currentMolecule * * * ccp.molecule.ChemComp.ChemComp memops.Implementation.TopObject ccp.molecule.Molecule.Molecule +guid: Line 1 1 1 1 +getPackageLocator() * ccp.molecule.Molecule.MolResidue +chemAtoms * * * 2 ccp.molecule.ChemComp.AbstractChemAtom ccp.molecule.ChemComp.ChemBond +chemAtoms ccp.molecule.ChemComp.ChemAtom Root and TopObjects
TopObjects • One in every package • Ultimate parent to all objects in package • Have globally unique identifier (‘guid’) • currentXyz links from root • Links can constrain links between descendants • In file implementations: • Hold links to storage and backup locations • Live in Implementation as almost empty shell
Overview • Packages • The Implementation package • Objects • DataTypes and DataObjTypes • Access control
CcpNmr Analysis • NMR Assignment Program • Inspired by ANSIG and Sparky • Demonstrates CCPN approach • Modern interface and scripting • Scalable and extensible • Operating Systems • Linux, Sun, SGI, OSX, Windows • Languages • Python • Data model interaction • Tk Graphical interface • Scripting • C • OpenGL/Tk contours • Structure display • Mathematical operations
Implementation Package • Model and Code: • Supertypes that define all objects • Objects • DataTypes • DataObjTyps • Basic data types • Data – how to access the real data: • Data location pointers • Current-package pointers • Implementation data are not part of the data set, and are not in the database. • Represent view or session?
TopObject MemopsRoot FileStorageObject 1 +guid: Line +name: Word = ccpProject +isLoaded: Boolean +override: Boolean = False +isModified: Boolean +getPackageLocator() +currentUserId: Word = user +isReading: Boolean * +isModifiable: Boolean = True 1 +newGuid() +createdBy: Word +getPackageLocator() +lastUnlockedBy: Word +setIsModifiable() 1 +touch() +saveTo(repository) +removeFrom(repository) +save() +activeRepositories +repositories * +backup() Repository {ordered} * +name: Line 1 * 1 +backedUp +backup +format: StorageFormat = xml PackageLocator +url: Url * +targetName: Word = any +getFileLocation(packageName) * 1..* +stored +repositories {ordered} Data Location
DataObject MemopsObject «DataType» ComplexDataType +applicationData: ApplicationData +isDeleted: Boolean +className: Word ImplementationObject +getExpandedKey() +packageName: Word +packageShortName: Word +qualifiedName: Line ccp.molecule.Molecule.MolResidue 1 +inConstructor: Boolean 1 +root * +getQualifiedName() 1 TopObject MemopsRoot +guid: Line +name: Word = ccpProject 1 +topObject +override: Boolean = False DbMemopsRoot +getPackageLocator() +currentUserId: Word = user * 1 1 +newGuid() * +getPackageLocator() ccp.molecule.Molecule.Molecule FileMemopsRoot +currentMolecule +saveModified() +saveAll() DbTopObject +refreshTopObjects(packageName) FileStorageObject +backupAll() +isLoaded: Boolean +importData(filePath) +isModified: Boolean +isReading: Boolean +isModifiable: Boolean = True FileTopObject +createdBy: Word +loadFrom(repository) +lastUnlockedBy: Word +load() +setIsModifiable() +restore() +touch() +saveTo(repository) +removeFrom(repository) +save() +backup() Objects and their Supertypes
DataType DataType DataType DataType DataType DataType DataType Boolean String Long Double Int Float Dict DataType DataType DataType DataType DataType DataType DataType Token SpacelessString Text SingleLine NonNegativeInt PositiveInt StringKeyDict DataType DataType DataType DataType DataType DataType DataType DateTime LongWord Word Line PositiveFloat NonNegativeFloat FloatRatio DataType DataType DataType DataType Any UrlProtocol PositiveDouble NonNegativeDouble Simple Data Types