1 / 93

caGrid 1.0 Service Architecture

caGrid 1.0 Service Architecture. caBIG™ Annual Meeting 2007 February 5-7 th , 2007. See Powerpoint "notes" section for annotations on these slides. Scott Oster ( oster@bmi.osu.edu ) - Ohio State University. Agenda. High Level Overview caGrid Service Architecture Metadata Infrastructure

maisie-cook
Download Presentation

caGrid 1.0 Service Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. caGrid 1.0Service Architecture caBIG™ Annual Meeting 2007 February 5-7th, 2007 See Powerpoint "notes" section for annotations on these slides Scott Oster (oster@bmi.osu.edu) - Ohio State University

  2. Agenda • High Level Overview • caGrid Service Architecture • Metadata Infrastructure • Component Highlight • Identifier Services Framework • Requirements and Features Overview • caBIO case study

  3. What is caBIG? • Common, widely distributed infrastructure that permits the cancer research community to focus on innovation • Shared, harmonized set of terminology, data elements, and data models that facilitate information exchange • Collection of interoperable applications developed to common standards • Cancer research data is available for mining and integration

  4. What is caGrid? • Development project of Architecture Workspace, aimed at helping define and implement Gold Compliance • No requirements on implementation technology will be necessary for Gold compliance • Specifications will be created defining requirements for interoperability • caGrid provides core infrastructure, and tooling to provide “a way” to achieve Gold compliance • Gold compliance creates the G in caBIG™ • Gold => Grid => connecting Silver Systems

  5. What is Grid? • A lot of different things to a lot of different people • Evolution of distributed computing to support sciences and engineering • Some common themes prevail: • Sharing of resources (computational, storage, data, etc) • Secure Access (global authentication, local authorization, policies, trust, etc) • Open Standards • Virtualization • “The real and specific problem that underlies the Grid concept is coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations.” • I. Foster, C. Kesselman, S. Tuecke. International J. Supercomputer Applications, 15(3), 2001. • A good general overview can be found here: http://gridcafe.web.cern.ch/gridcafe/

  6. History of caGrid • caGrid 1.0 is a revolutionary release of the caGrid infrastructure • It is generally not compatible with caGrid 0.5.x releases, though conceptually, it plays the same role with increased stability, functionality, and scope

  7. caGrid Community Involvement • caGrid itself provides no real “data” or “analysis” to caBIG™; its the enabling infrastructure which allows the community to do so • Community members add value to the grid as applications, services, and processes (for example: shared workflows) • caGrid provides the necessary core services, APIs, and tooling • The real “value” of the grid comes from bringing this information to the “end user” • Community members develop end user applications which consume of the resources provided by the grid

  8. What is a Community Provided caGrid Service? • Silver compatible systems are exposed to the Grid as caGrid Services • caDSR models are used for all data types, and transported over the grid in a common fashion • Standardized, common pattern and mechanism for remote access • Language and implementation technology independent • Common security infrastructure for authentication and authorization • Standardized service metadata models and metadata advertisement mechanisms • Community provided service types: • Data Services • Expose data to the grid in a unified way • Analytical Services • Expose analytical operations to the grid

  9. caGrid exposing caDSR data • In caDSR, nearly all items can be individually versioned, but a specific Project version represents a curated collection of Packages, Classes, and Attributes with consistent semantics • Data is exposed in caGrid as instances defined in a specific Project • Data Services expose instance data as a subset of Classes from a given Project • Analytical Services operate over instances of Classes from potentially numerous Projects

  10. caGrid exposing Silver Systems • Object Oriented APIs and data resources are developed using Object types and information models registered in the caDSR • These “silver systems” are grid-enabled by defining a grid service interface that defines the functionality to be exposed to the grid • The grid service interface uses the same Object types as the existing system, but leverages a platform and language neutral representation (XML) of them • The grid service implementation maps service invocations to API calls or queries into the existing system

  11. caGrid Metadata Infrastructure Goals • Support strongly typed grid • Syntactic and Semantic interoperability • Programmatic! • Smooth transition from Application to Grid and back • Leverage wealth of existing metadata • Enable service Advertisement and Discovery

  12. caGrid Data Description Infrastructure • Client and service APIs are object oriented, and operate over well-defined and curated data types • Objects are defined in UML and converted into ISO/IEC 11179 Administered Components, which are in turn registered in the Cancer Data Standards Repository (caDSR) • Object definitions draw from controlled terminology and vocabulary registered in the Enterprise Vocabulary Services (EVS), and their relationships are thus semantically described • XML serialization of objects adhere to XML schemas registered in the Global Model Exchange (GME)

  13. Service Layers

  14. Service Layers: caBIO Data Service example • Introduce-managed Security constraints • GTS-managed Trusted Authorities • CSM/Grid Grouper Authorization • Common Data Service Operations (WSDL) • CQL, CQLResult, Data Service Faults (XSD) • caBIO Schemas (XSD) • caGrid Metadata Schemas (XSD) • WS-Enumeration Operations and Types (WSDL, XSD) • Introduce-generated ServiceMetadata • Introduce-generated DomainModel • Introduce-provided common operation implementations (Resource Property, Security Metadata) • caGrid-provided CQL implementation to query ApplicationService • Introduce managed configuration points: • Index Service Location • Data Service Component Implementations (CQL Processor, Validators) • ApplicationService Information • Other options • Introduce-generated code to manage service group registration and maintenance • Introduce-generated Resource to manage metadata • Introduce-generated Resources to manage enumerations

  15. Service Metadata: All Services • Common Service Metadata • Provided by all services • Details service’s capabilities, operations, contact information, hosting research center • Service operation’s inputs and outputs defined in terms of structure and semantics extracted from caDSR and EVS • Majority auto-generated by Introduce

  16. Service Metadata: All Services • Common Service Metadata • Provided by all services • Details service’s capabilities, operations, contact information, hosting research center • Service operation’s inputs and outputs defined in terms of structure and semantics extracted from caDSR and EVS • Majority auto-generated by Introduce This screen shot illustrates the UML model of the Common Service Metadata. The details of the UML model cannot be seen by the viewer but illustrates, as described, the structure and semantics extracted from the caDSR.

  17. Service Metadata: Service Security • Service Security Metadata • Provided by all services • Details the service’s requirements on communication channel for each operation • Can be used by client to programmatically negotiate an acceptable means of communication • For example: Does operation X allow anonymous clients, or are credentials required? • Auto-generated by Introduce

  18. Service Metadata: Service Security • Service Security Metadata • Provided by all services • Details the service’s requirements on communication channel for each operation • Can be used by client to programmatically negotiate an acceptable means of communication • For example: Does operation X allow anonymous clients, or are credentials required? • Auto-generated by Introduce This screen shot illustrates the UML model of the Service Security Metadata. The details of the service’s requirements on the communication channel for each operation (transport, secure message and secure conversation) are modeled.

  19. Service Metadata: Data Service • Data Service Metadata • Provided by all data services • Describes the Domain Model being exposed, in terms of a UML model linked to semantics • Provides information needed to formulate the Object-Oriented Query • As with common metadata, data types defined in terms of structure and semantics extracted from caDSR and EVS • Auto-generated by Introduce

  20. Service Metadata: Data Service • Data Service Metadata • Provided by all data services • Describes the Domain Model being exposed, in terms of a UML model linked to semantics • Provides information needed to formulate the Object-Oriented Query • As with common metadata, data types defined in terms of structure and semantics extracted from caDSR and EVS • Auto-generated by Introduce This screen shot illustrates the UML model of the Data Service Metadata of the structure and semantics extracted from the caDSR that is exposed. This information/metadata can be used to formulate the Object-Oriented Query.

  21. Metadata Creation • Metadata can be added to services in Introduce by selecting the desired XML type • The “caGrid Metadata” extension automatically synchronizes the Service Metadata instance using the local Introduce service model and the caDSR grid service • The “Data” extension automatically synchronizes the Domain Model instances using the information in the Data tab, and the caDSR grid service • Provides ability to specify whether or not to publish each model to Index Service

  22. Advertisement and Discovery Overview • Advertisement: • The caGrid Grid Service Owner composes service metadata describing the service to the grid and publishes it to grid. The service metadata describes properties of the grid services that caGrid users and other grid services may query. • Discovery: • A caGrid Researcher specifies search criteria describing a service. The research submits the discovery request to a discovery service, which identifies a list of services matching the criteria, and returns the list to the researcher. • Use Cases: • Publish advertisement a Grid Service • Remove advertisement of a Grid Service • Update advertisement of a Grid Service • Discover advertisement a Grid Service

  23. Advertisement and Discovery Process • All services register their service location and metadata information to an Index Service • The Index Service subscribes to the standardized metadata and aggregates their contents • Clients can discover services using a discovery API which facilitates inspection of data types • Leveraging semantic information in EVS (from which service metadata is drawn), services can be discovered by the semantics of their data types

  24. Service Discovery Process • Clients formulate a query over the caGrid standard metadata • Examples: • “Find me all the services from Cancer Center X” • “Which Analytical services take Genes as input?” • “Which Data services expose data relating to lungcancer?” • “Find me all the services with some metadata mentioning the string ‘macromolecules’” • This query is sent to the caGrid Index Service which returns the Address of the services satisfying the query • The client can then further interrogate the satisfying services by asking for all of their metadata or service descriptions • Finally the client invokes the desired services as appropriate

  25. Future Structures from the Building Blocks • The semantic data registration, data type registration, standard service metadata aggregation and registration, and service interface description provide the building blocks for interesting higher level applications and services • Workflow/Data Pathway Discovery • Create a graph of potential data flow scenarios from one service to another from a starting data type (type matching) • Identify Semantically compatible Services • Identify services which work with the same EVS concepts, but use different data models (create a data mapping, prompt for harmonization) • Dynamic Graphical Invocation • Starting with an EVS concept, find Data Services providing data types based on that concept, query for data from one, find Analytical Services operating on that data type, request the WSDL/XSD of the operation, create a GUI dynamically from that information to fill out the additional parameters, invoke the service

  26. caGrid Components • Leverage existing technologies: • caDSR, EVS, Mobius GME: Common data elements, controlled vocabularies, schema management • Globus Toolkit (currently version 4.0.3) • Core grid services infrastructure • Service deployment, service registry, invocation, base security infrastructure • Additional Core Infrastructure • Higher-level security services • Grid service access to metadata components (caDSR, EVS, GME, etc) • Workflow, Identifier, Federated Query services • Service Provider Tooling (Introduce) • Graphical service development and configuration environment • Abstractions from grid service infrastructure for Data and Analytical services • Deployment wizards • Client Tooling • Installer • High-level APIs for interacting with core components and services • Graphical Tools (administration tools, sample applications, etc) • Production Deployment and Support of Infrastructure Services

  27. caGrid Production Environment

  28. caGrid Projects • The caGrid release is oriented around a number of individual projects • Build process manages inter-project dependencies • Each project provides a specific set of functionality, and is self contained once caGrid is built • Grid Services: • authentication-service, cadsr, dorian, evs, fqp, gme, gridgrouper, gts, index, syncgts, workflow, ws-naming, ws-transfer • Grid Service Components and Extensions: • authz, bulkDataTransfer, cabigextensions, data, sdkQuery, sdkQuery32, service-security-provider, ws-enum, ws-handlesystem • Utilities and APIs: • AntInstallerFramework, core, discovery, graph, gridca, metadata, metadatautils, opensaml • Applications: • installer, introduce, portal, security-ui

  29. Metadata Services • Cancer Data Standards Repository (caDSR) • caBIG projects register their data models as Common Data Elements (CDEs) which are semantically harmonized and then centrally stored and managed the caDSR • The caDSR grid service provides: • Model discovery and traversal • caGrid standard metadata generation capabilities • Enterprise Vocabulary Services (EVS) • EVS is set of services and resources that address the need for controlled vocabulary • The EVS grid service provides: • Query access to the data semantics and controlled vocabulary managed by the EVS • Global Model Exchange (GME) • GME is a DNS-like data definition registry and exchange service that is responsible for storing and linking together data models in the form of XML schema. • The GME grid service provides: • Access to the authoritative structural representation of data types on the grid • Globus Information Services: Index Service • The Globus Information Services infrastructure provides a generic framework for aggregation of service metadata, a registry of running Grid services, and a dynamic data-generating and indexing node, suitable for use in a hierarchy or federation of services • The Index grid service provides: • Yellow and white pages for the grid

  30. caGrid Security Components • Dorian • Grid User Account Management • Enables Identity Management and Federation • Authentication Service • Provides a uniform authentication interface in which applications can be built on, and a framework for issuing SAML assertions for existing credential providers such that they may easily integrated with Dorian and other grid credential providers • Grid Trust Service (GTS) • Creation and Management of a federated trust fabric. • Supports applications and services in deciding whether or not signers of digital credentials/user attributes can be trusted. • Grid Grouper • Grid Group / VO Management • Enables Group/VO Based Authorization • Authorization Support • Provides a framework to perform service authorization based on permissions from both the Common Security Module (CSM) as well as Grid Grouper groups • Security Communication Metadata • Metadata providing the ability for two parties to negotiate a communication mechanism which meets the service’s requirements • Grid CA • APIs and Command Line for platform independent certificate authority

  31. Data Service Overview • caGrid Data Services provide capability to expose silver data resources to the Grid • Specialization of caGrid grid services to expose data through a common query interface • Meet all base service requirements of caGrid services • Present an object view of data sources • Exposed objects are registered in caDSR and their XML representation in GME • Data Service Metadata describes information model • Queries made with CQL Query objects • Results returned as objects nested in a CQL Query Result Set • Graphical Development tool, implemented as an extension to the Introduce Toolkit, is used to create the new grid service

  32. Data Service Query Language • Specifies a target object (result) type and selects the instances which satisfy the specified properties and nested object properties • Allows path navigation • Provides logical grouping • Provides name/predicate/value filtering on properties of objects • Recursively defined • Ability to return full Objects, Set of attributes, count of results, or distinct attribute values

  33. Example CQL Query

  34. Example CQL Query This screen shot illustrates the caGrid CQL Query. A partial UML model is shown with the Gene class identified with a red circle. The corresponding CQL query targeting the Gene class is also shown in red.

  35. Example CQL Query LIKE “BRCA%”

  36. Example CQL Query LIKE “BRCA%” This screen shot illustrates the caGrid CQL Query. A partial UML model is shown with the Gene class identified with a red circle and the symbol attribute demarked as “LIKE “BRCA%””. The corresponding CQL query targeting the Gene class is also shown in red and the attribute name and predicate shown in blue.

  37. Example CQL Query LIKE “BRCA%”

  38. Example CQL Query LIKE “BRCA%” This screen shot illustrates the caGrid CQL Query. A partial UML model is shown with the Gene class identified with a red circle and the symbol attribute demarked as “LIKE “BRCA%””. The corresponding CQL query targeting the Gene class is also shown in red and the attribute name and predicate shown in dark blue with the class Taxon linked by association role name shown in light blue.

  39. Example CQL Query LIKE “BRCA%” = “Homo sapiens”

  40. Example CQL Query This screen shot illustrates the caGrid CQL Query. A partial UML model is shown with the Gene class identified with a red circle and the symbol attribute demarked as “LIKE “BRCA%””. The corresponding CQL query targeting the Gene class is also shown in red and the attribute name and predicate shown in dark blue with the class Taxon linked by association shown in light blue. The attribute name of the class Taxon is shown in green. LIKE “BRCA%” = “Homo sapiens”

  41. Federated Query Processor • Provides a mechanism to perform basic distributed aggregations and joins of queries over multiple data services • As caGrid data services all use a uniform query language, CQL, the Federated Query Infrastructure can be used to express queries over any combination of caGrid data services • Federated queries are expressed with a query language, DCQL, which is an extension to CQL to express such concepts as joins, aggregations, and target services • Implemented as a stateful grid service, queries may be executed asynchronously and results retrieved at a later time • Supports secure deployments wherein result ownership is enforced • Coupled with semantic discovery capabilities of caGrid, provides a powerful framework for data discovery, mining, and integration

  42. Workflow Service • Provides capability to describe “orchestrations” of service invocations and data movement • Uses existing defacto standard language: Business Process Execution Language (BPEL) • Interoperability with other clients, editors, engines • Implemented as a stateful grid service, workflows can be created, stopped, paused, resumed, and cancelled and results retrieved at a later time • Coupled with semantic discovery, service metadata, and registration of data type structures in caGrid, provides a powerful framework for analyzing data • Services can be dynamically discovered and federated queries can be invoked as part of a workflow

  43. IdentifierServices Framework • Requirements and Features Overview • caBIO Integration • Frank Siebenlist (franks@mcs.anl.gov) - University of Chicago / Argonne National Laboratory • Doug Mason (masondo@mail.nih.gov) -NIH/NCI

  44. caGrid’s Identifier Services Framework • Identifier • “Naming” of individual Data-Objects • Globally Unique Name for each Data-Object • Services • Create/modify/delete name-object bindings • Resolve name to data-object • Framework • Provide for Trust Fabric => Binding Integrity • Policy-driven Administration => Curator Model • Fully Integrated with caGrid’s Architecture and Implementation

  45. Why (Standardized) Data-Object Identifiers? • Efficiency • Passing by reference vs by value(Data-Object can be many Mbytes) • Data-Object Equality test through String comparison(inequality test is no requirement…) • Consistency • Standardized way of referencing objects • Standard identifier => data-object resolution mechanism • Meta-data binding to standard object reference • Well-known primary/foreign key for (distributed) JOINs • Name for policy expression for data-object access • Name for audit entries about data-object related activities • … • Possible correlation of all of the above…

  46. Data-Object Identifier Properties • Identifier is a String • Identifier is a forever globally unique name for single Data-Object • Identifier can be (globally) resolved to associated Data-Object • Data-Objects are immutable, almost immutable or mutable… • Identifier value “meaningless” opaque string for consumer • Resolution information embedded in Identifier Name • Only meaningful for resolution service related components • Identifier is a Universal Resource Identifier (URI) • URI-schema will be made completely transparent from Identifier producing applications and consumers. • ”bigid:” - at least until we have learned more about its usage…(… and to avoid distracting schema-choice discussions)

  47. Identifier Usage Model

  48. Naming Authority, Identifier Curator, Data Owner and Identifier User • Naming Authority (NA) • Guards integrity of identifier namespace & bindings • Maintains identifier to data-object’s endpoint mapping • Conceptually equivalent to caDSR… • Identifier Curator/Administrator • Understands semantics/access of data owner’s objects • Trusted by NA to administer binding for certain identifiers • Administers identifier to data-object’s endpoint binding • Data Owner • Provides access to data-objects through “endpoint-references” • Identifier User/Consumer • Trusts an NA for certain identifier bindings • Uses 2-step resolution to obtain data-object(identifier => endpoint => data-object) • (In-)Directly trusts Data Owner for data-object integrity

  49. Grid Identifier Framework - Client SideSimple API abstracts Framework Services

  50. Grid Identifier Framework - Server SideSimple API & Integration

More Related