1 / 45

Macromolecular Structure Middleware

Macromolecular Structure Middleware. OpenMMS An Ontology Driven Architecture. Overview. The mmCIF Ontology OpenMMS Toolkit Macromolecular Structure (MMS) Metamodel Parser, XML SQL / Corba Servers and Clients Corba UML and the future. How do we “Enable” Science?.

Download Presentation

Macromolecular Structure Middleware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Macromolecular Structure Middleware OpenMMS An Ontology Driven Architecture

  2. Overview • The mmCIF Ontology • OpenMMS Toolkit • Macromolecular Structure (MMS) Metamodel • Parser, XML • SQL / Corba Servers and Clients • Corba • UML and the future...

  3. How do we “Enable” Science? • Promote well defined Macromolecular Structure (MMS) Specifications • Distribution – Open Interfaces • Now: • flat files • W3 browsing and searching • Future: • XML, SQL, CORBA

  4. Why OpenMMS? • Allow programmers to more easily create efficient, high performance and robust applications. • A Java-only toolkit with that creates XML, CORBA and Relational DB representations of the mmCIF Macromolecular Structure Data. • Source code is publicly available so users can easily modify the metamodel or create an entirely new one.

  5. What Do We Mean by an Ontology Driven Architecture? What do we mean by an Ontology? A bridge between Our World of Natural Language and the World of Machines.

  6. mmCIF Dictionary and Data Files • Based on Ontology for Macromolecular Structure defined by the International Union of Crystallography • Replaces the older 80-Column PDB files • mmCIF Dictionary contains over 140 Category and 1600 Item definitions • Open, Extensible • Provides a well-defined reference standard for data distribution

  7. mmCIF Parsers Applications XML Files mmCIF Data Files (Reference Standard) Relational Database Corba Server OpenMMS Toolkit Data Flow

  8. Metamodel Information Flow mmCIF Dictionary mmCIF Ontology Metamodel Metamodel Framework Corba IDL, SQL Schema, XML DTD, Java Data Loaders JDBC Loaders

  9. What can OpenMMS do? • PDBase program will load any or all PDB files into any SQL-92 compatible database (Oracle, mySQL, Sybase...) • Translate any PDB file into an XML file. • Contains Two Corba servers: • Reference server will cache and serve data read from PDB flat files. • DB server will cache and serve data read from a SQL database (very quickly...) • All Source code written in Java and publicly available.

  10. Some Advantages of Using an Ontology Driven Architecture • Scales to very large Ontologies • More reliable and maintainable code • Transfer between representations • Scientific Correctness of representation • Help in maintaining backward compatibility

  11. How does one actually represent an ontology?(OpenMMS Internal Metamodel Overview) Root Visitor Abstract Class Module Module Interface Struct Visitor Subclass Struct Struct Field Field

  12. mmCIF Parsers • General Purpose, Low-level access to data • Parsers available in many languages • OpenMMS toolkit includes Java Parser • Uses “Builder” Design Pattern • An application subclasses Abstract Builder class and stores data into its data structures

  13. MMS in XML • Large Flat Files (open and close tags) • Tables can be grouped by rows or columns • XML from SQL Query • Many requests from Web browsers don’t really need or want all the data • SW available from DB Vendors and ISVs for creating XML files from SQL result sets • Smaller files load faster

  14. Relational DB Expression • SQL-92 Compatible • Schemas for all the standard DB vendors • Fast and Flexible Keyword searches • PDBase loader allows structures to be selectively loaded • Oracle Instance Tested • 14,556 Structures • 16GB, 88 Million Atom Records

  15. A very high-level (and very-rough) classification of communication • Person-to-Person communication • email • Person-to-Machine communication • HTTP/HTML • Machine-to-Machine communication • CORBA, SQL, .NET, Soap • Not Communications -> Data Formats • XML, mmCIF (STAR), many more …

  16. What is CORBA? Common Object Request Broker Architecture • Defines a family of open software interface specifications for distributed object computing. http://www.omg.org

  17. What is an Object? “A Data Structure with an Attitude” Programs = Algorithms + Data Structure Object Oriented Programming Principle: Partition the parts of algorithms with the data structures they use

  18. Side View of a Distributed Application Client E.g. a Java Applet Server Middle Ware Middle Ware E.g. Mainframe Computer Server IDL IDL Network Internet (TCP/IP)

  19. The “Hourglass” view of the Internet Applications • OO High-Level Interface HTTP, Corba, .NET TCP, RTP,...  Reliable Bitsteam IP  Unreliable Datagrams Copper, Glass Radio Spectrum (ATM, Ethernet, V.90, SONET...)

  20. Where is Corba? • Inside every Java Runtime Environment. • Commonly used in middle tier and backend (e.g. database) connections. • Open Source and Commercial Implementations Available • Usually buried deep inside the software • Difficult or impossible to tell when it is being used

  21. What is Distributed Object Computing? • Extends the benefits of object-oriented technology across process and machine boundaries to encompass entire networks. • Attempts to make remote objects appear to programmers as if they were local objects in the same process. This is called location transparency.

  22. Advantages of Distributed Object Computing • Easier (and faster) for programmers to create distributed applications • Increases Reliability • Increases Maintainability • Increases Portability • Increases Extensibility

  23. The Alphabet Soup • OMG = Object Management GroupConsortium of 800+ companies founded in 1989. • IDL = Interface Definition Language

  24. Shape of boundary is defined in IDL Boundaries, Interfaces • The key is to focus on boundaries, interfaces, how things fit together • Not on the internal details of how they’re built; assume that will be diverse & changing

  25. The glue that binds parts together is the ORB Boundaries, Interfaces The Interface to an object can be distributed over a network Shape of boundary is defined in IDL

  26. Corba Independence • Open Standard for Distributed Object Oriented Design • Independent of Hardware Platform • Independent of Operating System • Independent of Programming Language • Independent of Object Location

  27. IDL Object Client IDL Object Request Broker • ORBs mediate between objects and things that use them (clients) Object Request Broker

  28. Terminology • IIOP • The Internet Inter-ORB Protocol, defined in the Spec as a vendor-independent, wire-level network protocol on top of TCP/IP. This allows ORB implementations of different vendors to interoperate.

  29. Java C++ Perl C Ada Java Corba / IIOP—Internet Inter-ORB Protocol VB ActiveX ORBs: Medium for Integration ORB ORB ORB

  30. Corba Facilities:Industry Standards in Vertical Markets • Manufacturing • Finance • Life Sciences Research • C4I • Many others...

  31. Using Corba to accessMacromolecular Structure Data • No Parsing of Flat Files • Direct Access to Binary Data Structures • Strongly Typed Data • Granularity of Access • Indices and Presence Flags Pre-computed • Highest Performance

  32. OMG/LSR Macromolecular Structure Adoption Process • August 1999 RFP issued • March 2000 Initial Submission • September 2000 Revised Submission • February 2001 Adopted Spec by the OMG • 4Q 2001 OpenMMS LSR/MMS1.0 compliant implementation source code publicly available • February 2002 Approved as a Formal OMG Available Specification.

  33. Using the CORBA MMS Server An excerpt from legacy PDB Formatted File ATOM Record (4hhb.ent) ... ATOM 6 CG1 VAL A 1 7.009 20.127 5.418 6.00 61.79 ... ATOM 7 CG2 VAL A 1 5.246 18.533 5.681 6.00 80.12 ... ATOM 8 N LEU A 2 9.096 18.040 3.857 7.00 26.44 ... ATOM 9 CA LEU A 2 10.600 17.889 4.283 6.00 26.32 ... ATOM 10 C LEU A 2 11.265 19.184 5.297 6.00 32.96 ... ATOM 11 O LEU A 2 10.813 20.177 4.647 8.00 31.90 ... ATOM 12 CB LEU A 2 11.099 18.007 2.815 6.00 29.23 ... ATOM 13 CG LEU A 2 11.322 16.956 1.934 6.00 37.71 ... ATOM 14 CD1 LEU A 2 11.468 15.596 2.337 6.00 39.10 ... ATOM 15 CD2 LEU A 2 11.423 17.268 .300 6.00 37.47 ... ...

  34. LSR/MMS “ATOM Record” DsLSRMacromolecularStructure.idl excerpt: struct AtomSite { string id; IndexId type_symbol; AtomIndex label; IndexId label_entity; VectorXYZ cartn; float occupancy; float b_iso_or_equiv; };

  35. Example Code and Resulting Output Entry e = entryFactory.get_entry_from_id(”4hhb"); AtomSite[] a = e.get_atom_site_list(); for (int i = 0; i < a.length; i++) { System.out.println(a[i].id + " " + a[i].type_symbol.id + " (" + a[i].cartn.x + ", " + a[i].cartn.y + ", " + a[i].cartn.z + ")"); } produces: 1 N (11.065, 7.352, 9.598) 2 C (12.436, 7.764, 9.902) 3 C (12.883, 7.09, 11.208) 4 O (12.088, 7.0, 12.147) 5 C (12.611, 9.264, 10.06) ...

  36. What are the alternatives to Corba? • TCP/IP Sockets - Byte stream • DCOM, COM++, OLE, .NET (Microsoft Only) • DCOM   Corba Bridges are available from several vendors • SOAP (Simple Object Access Protocol) • XML Based

  37. Unified Modeling Language – UMLWhat do all those arrows and boxes Mean? • Schematic Language for Defining SW • Graphics Representations • UML = Things, Relations and Diagrams • 9 types of Diagrams • The most commonly used diagram is the “Class Diagram”

  38. Identifier ModificationDateList EntryIdList EntryId UML Class Diagram Example EntryFactory get_version() get_entry_id_list() get_entry_modification_dates() native_formats_supported() get_native_entry_representation() * * ModificationDate Entry_id : EntryId date: TimeBase::TimeT

  39. UML Class Diagram Basics  Underlined for Class Instances, Italics for Abstract Classes Class_Name var1: Type var2: Type  Variables method1() method2() method3() • Methods Details may be omitted if not important

  40. UML Relationships Dependency 0..1 * Association Generalization (Inheritance) Aggregation *

  41. Identifier ModificationDateList EntryIdList EntryId UML Example EntryFactory get_version() get_entry_id_list() get_entry_modification_dates() native_formats_supported() get_native_entry_representation() * * ModificationDate Entry_id : EntryId Date : TimeBase::TimeT

  42. XMI: XML Metadata Interchange • UML is a graphical representation; need some way to exchange UML models between applications • XMI is used to store and transmit UML models • XML based • Defines XML tags for classes, relationships between classes etc.

  43. OMG MDA • Platform Independent Models (PIMs) that define the interface are defined in UML • The PIMs are translated to Platform Specific Models (PSMs) such as Corba, SOAP, .NET or XML Schemas • The Corba servers and clients may be the same, but now the interface is defined in UML and the IDL is then generated from the UML

  44. MDA Platform Independent toPlatform Dependent Translation UML .NET Corba SOAP XML

  45. Phil Bourne John Westbrook David Benton Karl Konnerth Lynn TenEyck Thanks and Acknowledgments

More Related