1 / 49

Memops Data modelling and automatic code generation

Edinburgh 9 September 2008. Memops Data modelling and automatic code generation. Memops - main points. Code generation framework Data access subroutine libraries Fully automatic code generation from model Several programming languages in parallel Precise, detailed, validated data. Memops.

solada
Download Presentation

Memops Data modelling and automatic code generation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Edinburgh 9 September 2008 MemopsData modelling and automatic code generation

  2. Memops - main points • Code generation framework • Data access subroutine libraries • Fully automatic code generation from model • Several programming languages in parallel • Precise, detailed, validated data

  3. Memops • Introduction • Code generation • Generated libraries • Applications of Memops

  4. The CCPN Project • Collaborative Computing Project for NMR • Since 1999 • Unifying platform for NMR software similar to CCP4 for X-ray crystallography • Community-based, open-source, software development • Code generation, data model, applications, meetings

  5. NMR Structural Biology Pipeline Sample Preparation NMR Machine Data Processing Spectrum Analysis Structure Calculation Slow, complex,interactive Repository Database

  6. Native Anarchy Task1 Task2 Convert Task1 Task1 Convert Convert Convert Task2 Convert Task3 Task3 Task3

  7. With Data Standard Task1 Task2 Task1 Convert Convert Convert DataStandard Task1 Task2 Convert Convert Convert Task3 Task3 Task3

  8. Data standard - objectives • Lossless data transfer between programs- different approaches and architectures • All data needed for pipeline software • Creating data, not analysing end results • Intermediate results needed • Comprehensive, detailed, complex • Completeness, integrity of changing data • Precisely defined standard • A single central description • Validation directly against standard

  9. CCPN approach • Standard API, no stable format • easier to maintain as model changes • Abstract data model • Exact correspondence to APIs • API implementations for several languages • Transparent access to XML or DB storage • Complete validation of model rules and constraints

  10. Memops • Introduction • Code generation • Generated libraries • Applications of Memops

  11. Automatic Code generation • Model will change over time • Several parallel implementations • Synchronisation between APIs and model • Maintenance and debugging • Resources are limited • Automatic Code Generation • Write and debug once and for all • Any domain, from Astrophysics to Zoology • Quick and simple to extend model • E.g. Application-specific packages

  12. Software Domain MEMOPS Developers Experts framework Handcoded (< 1%)‏ Autogeneration Documentation UML Model Package 1 APIs User Package 2 Python Package 3 Application Storage Java Wrappers SQL C Deposition XML Code Generation Framework

  13. ObjectDomainUML data Export Autogeneration Code Generation Legend: edit UML CCPN codeOff-the-shelf files CCPN generated API codeSchemasMappings etc. In-Memory ModelPython objects MetaModel On-disk model XML file

  14. TextWriter ModelTraverse PyApiGen FileApiGen PyLanguage ApiGen PyType PyFileApiGen API generator • Written in Python • Modular • Different generators share code

  15. Memops • Introduction • Code generation • Generated libraries • Applications of Memops

  16. Model features • Packages to subdivide model, code, and data files • Objects. Unique context, compare-by-identity • Complex data types. Different contexts, compare-by-value • Simple data types, PositiveInt, enumerations, … • Attributes and links: • Cardinality, frozen/modifiable, derived • Unique/ordered collections (sets, lists, unique lists) • Ad-hocconstraints on attributes, simple and complex datatypes, and objects.

  17. ccp.molecule.MolSystem.MolSystem StructureEnsemble 1 * +code: Word Chain +ensembleId: Int +name: Text +atomNamingSystem: Line +code: Line +keywords: Line +resNamingSystem: Line +getChain() 1 ...: * +getEnsembleValidations() 1 +coordChains 1 1 * 1 * * Residue Model ccp.molecule.MolSystem.Chain +seqId: Int +serial: Int +seqCode: Int 1 +name: Line +seqInsertCode: Line = +details: Text * +getResidue() ccp.molecule.MolSystem.Residue 1 1 1 1 * * Coord * Atom 1 +altLocationCode: Line = ccp.molecule.MolSystem.Atom +name: Word 1 +x: Float +elementSymbol: Word +y: Float * +z: Float +getAtom() +bFactor: Float = 0.0 1 +getElementSymbol() 1 +occupancy: Float = 1.0 +getChemAtom() ccp.molecule.ChemComp.ChemAtom Molstructure model package

  18. CCPN APIs • Application Programming Interface • Object oriented • Data accessed in memory as if stored in the data model • Implementations come with: • Integrated, transparent I/O (file or database)‏ • Complete validity checking • Protection against casual change (data encapsulation) • Versioning and backwards compatibility • Event notifier system • Slot for application-specific data

  19. Legend: CCPN codeOff-the-shelf Application codefiles Science code User InterfaceUtility functions CCPN generated Python+XML at runtime User application Data get, set. Validity check Python API XML parser XML I/O code Generic XML read/write XML I/O mappings What to do for which element User data in CCPN XMLformat Data StorageXML files

  20. Science code User InterfaceUtility functions Java+DB at runtime Legend: CCPN code Off-the-shelf Application code files CCPN generated HQL Presentation layer Custom queries(Hibernate QueryLanguage) Optional Java API Hibernate mappings Hibernate Hibernate Database Schema Database

  21. Now Available • Version 2.0 just released • Python+XML, Java+XML, C+XML Java+DB (with Hibernate) • Available under GPL licensefrom Sourceforge or www.ccpn.ac.uk • CCPN Data Standard: • NMR, Macromolecules, LIMS • 46 packages • 552 classes and data types • Python+XML implementation 800,000+ lines of code

  22. Memops • Introduction • Code generation • Generated libraries • Applications of Memops

  23. CcpNmr Suite • Analysis • Interactive NMR analysis • FormatConverter • Convert between 30+ NMR and structure formats • Built on top of CCPN model (Python+XML) • Version 2.0 released • Widely used in macromlecular NMR

  24. CcpNmr Analysis

  25. ExtendNMR NMR pipeline • Integrated macromolecular NMR pipeline- from sample to structure • Pre-existing programs from 8 groups • In-memory conversion to internal data structures • Integrated versions released: • ARIA (NMR structure generation) • Bruker TOPSPIN, Manufacturers processing/analysis package

  26. BIOXDM • Software pipeline for on-synchrotron crystallography • Exploit new technology ( goniometers) • Experiment optimisation, acquisition, and on-line processing • Independent data model, with Memops machinery • Java+DB implementation for runtime concurrent access

  27. EUROCarbDB • Distributed deposition database • Glycobiology and glycomics • NMR, MS, HPLC and topology • Java. Database storage using Hibernate • CCPN model Java+DB implementation slot in as-is

  28. Funding acknowledgements • BBSRC CCPN grants • European Union grants • EXTEND-NMR, EU-NMR, NMR-Life, NMRQUAL, and TEMBLOR contracts • Industry support • AstraZeneca, Dupont Pharma (now BMS), Genentech, GlaxoSmithKline • Peter Keller (BIOXDM) thanks Synchrotron ‘Soleil’, the Global Phasing Consortium and EU FP6 ‘BIOXHIT’

  29. People • Authors: Prof. Ernest Laue, Wayne Boucher, Rasmus Fogh, Tim Stevens, John Ionides, Wim Vranken (EBI), Peter Keller (Global Phasing) • Collaborators at U. Cambridge: Dan O’Donovan, Wolfgang Rieping, Alan da Silva, Darima Lamazhapova • Collaborators at EBI (MSD), Hinxton: Kim Henrick, Anne Pajon, Chris Penkett • Special thanks to: Bruker Biospin GmbH (TOPSPIN), Michael Nilges (ARIA), Bas Leeflang (EUROCarbDB; FP6 contract RIDS-CT-2004-01195

  30. END

  31. Overview • Packages • The Implementation package • Objects • DataTypes and DataObjTypes • Access control

  32. ARIA – structure generation from NMR data Custom conversion Application ARIA XML ARIA Data Model CCPN Data Model CCPN XML • ARIA imports • Peak Lists • Constraints • Sequences • Chemical shifts • ARIA exports • Peak Assignments • Filtered Constraints • Violations • Structures

  33. API functions • ‘get’ and ‘set’ (Attributes and links)‏ • ‘add’ and ‘remove’ (Collection attributes and links)‏ • ‘sorted’ (Unordered collection links)‏ • ‘findFirst’ and ‘findAll’ (Collection links)‏ • Simple filtering (attribute == value)‏ • create and ‘new’ (Objects)‏ • Normal and ‘factory function’ object creation • delete (Objects)‏ • ‘Delete’ function – cascades to objects rendered invalid by deletion • checkValid, checkAllValid (Objects)‏ • API classes are strongly coupled. For efficiency reasons object-to-object links are two-way.

  34. FormatConverter - The NMR Translator Peaks Chemical shifts Acquisition parameters XEasy NmrView ... XEasy NmrView ... Bruker Varian Format specific readers Generic peak converter Generic chemical shift converter Generic acquisition parameters converter Data model entry CCPN Data Model Format specific writers XEasy NmrView ... XEasy NmrView ... Azara NMRPipe Peaks Chemical shifts Processing parameters

  35. ExtendNMR: ARIA • Structure generation from macromolecular NMR data, ambiguous distance constraints • One of two leading programs • Python and scripts, with CNS dynamics engine • All input and output integrated to CCPN standard

  36. ARIA: CCPN object selection

  37. ExtendNMR: Bruker TOPSPIN • NMR processing program of major NMR instrument company • Java. In-memory conversion to CCPN Java+XML implementation • CCPN output in current TOPSPIN release,Expanded in upcoming release.

  38. Atom Bond 2 +bonds +elementName: String = C +bondOrder: Float = 1.0 +atoms * Data Model v. Data Format Abstract model (UML) : Relational Database : Atom Atom_Bond_Connect Bond XML : <Atom ID=“AT1” elementName=“C”> <Bond ID=“BD1” bondOrder=“1.0”> <BondList> <Atom1 IDREF=“AT1”/> <Bond IDREF=“BD1”/> <Atom2 IDREF=“AT2/> . </Bond> . </BondList> </Atom>

  39. Packages

  40. Packages • Partition model, code, and data • Import each other • Can be omitted • All import Implementation and AccessControl • Each have a TopObject • No links between data from rival Topbjects (different extents of data)‏

  41. memops.Implementation.MemopsRoot +name: Word = ccpProject +override: Boolean = False +currentUserId: Word = user +newGuid()‏ 1 1 +getPackageLocator()‏ 1 +currentChemComp +currentMolecule * * * ccp.molecule.ChemComp.ChemComp memops.Implementation.TopObject ccp.molecule.Molecule.Molecule +guid: Line 1 1 1 1 +getPackageLocator()‏ * ccp.molecule.Molecule.MolResidue +chemAtoms * * * 2 ccp.molecule.ChemComp.AbstractChemAtom ccp.molecule.ChemComp.ChemBond +chemAtoms ccp.molecule.ChemComp.ChemAtom Root and TopObjects

  42. TopObjects • One in every package • Ultimate parent to all objects in package • Have globally unique identifier (‘guid’)‏ • currentXyz links from root • Links can constrain links between descendants • In file implementations: • Hold links to storage and backup locations • Live in Implementation as almost empty shell

  43. Overview • Packages • The Implementation package • Objects • DataTypes and DataObjTypes • Access control

  44. CcpNmr Analysis • NMR Assignment Program • Inspired by ANSIG and Sparky • Demonstrates CCPN approach • Modern interface and scripting • Scalable and extensible • Operating Systems • Linux, Sun, SGI, OSX, Windows • Languages • Python • Data model interaction • Tk Graphical interface • Scripting • C • OpenGL/Tk contours • Structure display • Mathematical operations

  45. Implementation Package • Model and Code: • Supertypes that define all objects • Objects • DataTypes • DataObjTyps • Basic data types • Data – how to access the real data: • Data location pointers • Current-package pointers • Implementation data are not part of the data set, and are not in the database. • Represent view or session?

  46. TopObject MemopsRoot FileStorageObject 1 +guid: Line +name: Word = ccpProject +isLoaded: Boolean +override: Boolean = False +isModified: Boolean +getPackageLocator()‏ +currentUserId: Word = user +isReading: Boolean * +isModifiable: Boolean = True 1 +newGuid()‏ +createdBy: Word +getPackageLocator()‏ +lastUnlockedBy: Word +setIsModifiable()‏ 1 +touch()‏ +saveTo(repository)‏ +removeFrom(repository)‏ +save()‏ +activeRepositories +repositories * +backup()‏ Repository {ordered} * +name: Line 1 * 1 +backedUp +backup +format: StorageFormat = xml PackageLocator +url: Url * +targetName: Word = any +getFileLocation(packageName)‏ * 1..* +stored +repositories {ordered} Data Location

  47. DataObject MemopsObject «DataType» ComplexDataType +applicationData: ApplicationData +isDeleted: Boolean +className: Word ImplementationObject +getExpandedKey()‏ +packageName: Word +packageShortName: Word +qualifiedName: Line ccp.molecule.Molecule.MolResidue 1 +inConstructor: Boolean 1 +root * +getQualifiedName()‏ 1 TopObject MemopsRoot +guid: Line +name: Word = ccpProject 1 +topObject +override: Boolean = False DbMemopsRoot +getPackageLocator()‏ +currentUserId: Word = user * 1 1 +newGuid()‏ * +getPackageLocator()‏ ccp.molecule.Molecule.Molecule FileMemopsRoot +currentMolecule +saveModified()‏ +saveAll()‏ DbTopObject +refreshTopObjects(packageName)‏ FileStorageObject +backupAll()‏ +isLoaded: Boolean +importData(filePath)‏ +isModified: Boolean +isReading: Boolean +isModifiable: Boolean = True FileTopObject +createdBy: Word +loadFrom(repository)‏ +lastUnlockedBy: Word +load()‏ +setIsModifiable()‏ +restore()‏ +touch()‏ +saveTo(repository)‏ +removeFrom(repository)‏ +save()‏ +backup()‏ Objects and their Supertypes

  48. DataType DataType DataType DataType DataType DataType DataType Boolean String Long Double Int Float Dict DataType DataType DataType DataType DataType DataType DataType Token SpacelessString Text SingleLine NonNegativeInt PositiveInt StringKeyDict DataType DataType DataType DataType DataType DataType DataType DateTime LongWord Word Line PositiveFloat NonNegativeFloat FloatRatio DataType DataType DataType DataType Any UrlProtocol PositiveDouble NonNegativeDouble Simple Data Types

  49. Complex Data Types

More Related