540 likes | 648 Views
An Introduction to Track 4: SOA and Metadata (Semantics). 2 nd SOA for E-Government Conference 30-31 October 20006. Chuck Mosher Senior Enterprise Architect cmosher @ metamatrix.com. Agenda. The drivers for data (& metadata) integration Metadata in an SOA
E N D
An Introduction to Track 4: SOA and Metadata (Semantics) 2nd SOA for E-Government Conference 30-31 October 20006 Chuck Mosher Senior Enterprise Architect cmosher @ metamatrix.com
Agenda • The drivers for data (& metadata) integration • Metadata in an SOA • Data services: using active metadata to drive data integration • Beyond metadata: dictionaries, vocabularies, domain models, ontologies (semantics) • Why ontologies? • Overview of Track 4 Presentations • Q & A
Acknowledgements • Dave McComb*, Semantic Arts • Atif Kureishy*, Booz | Allen | Hamilton • John Salasin*, NIST • Jeff Pollock, Oracle • Brand Niemann, EPA • Andy Evans, Revelytix * Track 4 Speaker, 2:45-4:15 pm tomorrow
Data Interoperability Lies At The Very Core of DoD Transformation One of the three enablers which drives domain-wide visibility: “… is a standard enterprise data architecture — the foundation for effective and rapid data transfer and the fundamental building block to enable a common logistical picture.” Army Lt. Gen. Claude Christianson “If you look at all the trends in the IT arena over the past 30 to 40 years, we’ve moved into an environment where we’ve got faster networks, more powerful processors, but it really comes down to the data” Michael Todd, DOD CIO office
Dr. Linton Wells, as quoted in September’s NDIA Magazine, “…data compatibility may be an issue. Enabling digital interaction with nontraditional partners may require middleware or other programs that convert data from totally different formats …”
Problem Scope • Incompatible data meanings are the largest, most expensive, and time-consuming portion of IT visibility and IT interoperability projects: • Gartner… Forrester… NIST… • IDC… CIO Magazine… • The classic “n-squared” problem of interfaces is even more severe at the data layer: • Data-to-data interfaces outnumber “pipes” • Tightly-coupled is brittle, and requires code • Information growth is accelerating – FAST! • 2002-2005 – more new data than all of history • 5 exabytes of new digital data created in 2002 – enough for .5 million new Library’s of Congress Jeff Pollock – 2004 White House Conference on Semantic Technology
Agenda • The drivers for data (& metadata) integration • Metadata in an SOA • Data services: using active metadata to drive data integration • Beyond metadata: dictionaries, vocabularies, domain models, ontologies (semantics) • Why ontologies? • Overview of Track 4 Presentations • Q & A
Why Does SOA Need Metadata? • An architectural style enabling loose-coupling • Cornerstone of E-Government reengineering • Web Services and their related standards (SOAP, WSDL, UDDI) provide an implementation framework for several key features of SOA • BUT: Web Service technologies do not provide all the requirements for Dynamic USE of Discoverable Services • Discovery – Yes – UDDI/ebXML • Use – No – requires service consumers and providers to agree on a pre-defined standard interface for the service
SOA is Easy, It’s Metadata That’s Hard • SOA focuses on the interoperability between application interfaces & protocols • Data (and service) meaning, integrity, and transformation have to be addressed elsewhere • This information is found in the metadata • SOA makes getting control over the metadata critical to success • Or you will end up with SOA silos!
Integration Syntactic Semantic Application Process Accessibility Visibility Discoverability Management Governance Auditing Lineage Quality Compliance Change Mgmnt Impact Analysis Performance Metadata Is Everywhere Many of the problems & issues around SOA implementations & governance boil down to getting a solid handle on all of the types & forms of metadata involved
What Are Semantic Conflicts? Data Type Labeling Aggregation Structure Cardinality Generalization Value Representation Impedance Mismatch Naming Scaling and Unit Confounding Domain Integrity Different primitive or abstract types for same information Synonyms/antonyms have different text labels Different conceptions about the relationships among concepts in similar data sets. Collections or constraints have been modeled differently for same information Different abstractions are used to model same domain Different choices are made about what concepts are made explicit Fundamentally different data representations are used Synonyms/antonyms exist in same/similar concept instance values Different units of measures with incompatible scales Similar concepts with different definitions Fundamental incompatibilities in underlying domains Disparity among the integrity constraints Jeff Pollock – 2004 White House Conference on Semantic Technology
Metadata Management Maturity • Level 1: Inventory of information assets • Necessary 1st step – what data do we have • Typically stored in repositories, registries, spreadsheets, implicit in data itself (relational DB’s) • Level 2: Impact analysis • Develop domain vocabularies and data models • Discover or create relationships between system artifacts • Level 3: Metadata-driven integration • Design-time metadata repository + run-time integration • Example of Model-Driven Architecture • Level 4: Semantic Web • Dynamic, machine-based inferencing at the concept level
Data Evolution Timeline GIGO/minis/micros www / Netscape Web services OWL Age of Proprietary Data Age of Semantic Models Age of Programs Age of Open Data Age of Open Metadata Program-Data Text, Office Docs Databases (proprietary schema) HTML, XML (open schema) Namespaces, Taxonomies, RDF Ontologies & Inference 1945 -1970 1970 - 1994 1994 - 2000 2000 - 2003 2003 - Procedural Programming Object-Oriented Programming Model-Driven Programming “Data is less important than code” “Data is as important as code” “Data is more important than code” Michael Daconta, Creating Relevance and Reuse with Targeted Semantics, XML 2004 Conference Keynote, November 16, 2004.
Agenda • The drivers for data (& metadata) integration • Metadata in an SOA • Data services: using active metadata to drive data integration • Beyond metadata: dictionaries, vocabularies, domain models, ontologies (semantics) • Why ontologies? • Overview of Track 4 Presentations • Q & A
Information Challenges Communities of Interest Agency Challenges • 100’s/1000’s of data sources • 100’s/1000’s of applications • Multiple access points/modes for apps • Understanding relationships/semantics • Data consistency • Data reuse – bridging data silos • Support for Web Services & SQL • Control & manageability, compliance • Security & auditing ? Information Resources Program Challenges • Multiple sources • Different interfaces/drivers • Different physical structures • Different semantics • Single interface to data desired • Real-time access to data • Performance • Maintainability as data changes • Maintainability as apps change Mission Challenges • Time-to-deploy • Agility - Responsiveness to change • Automation – Reduce cost of new development and operations • ROI of enterprise information
Information Virtualization Communities of Interest Information Virtualization Layer Information Resources
Information Virtualization Information Virtualization Layer Unification of different concepts across systems Unified Semantic Layer Single-query access to heterogeneous systems Data Federation Layer Data Access/Connectivity Layer Uniform, standardized access to any system Enterprise Data Sources
Metadata-Based Data Service • Decouple data sources from application • Data implementation shielded from application • Semantic/Format Mediation • Standard vocabulary • Single access point • Web Service/XML • SQL • Federation • Single source or multi-source • Scalability • Security, performance XML/SOAP SQL Bridge the Gap Data Service SQL SQL API Call Master Data Agency Application Operational Data Store
FEA DRM View on Data Services DRM Version 2 Data Access Services • Context Awareness Services • Structural Awareness Services • Transactional Services • Data Query Services • Content Search and Discovery Services • Retrieval Services • Subscription Services • Notification Services Service Types include: • Metadata / Data • Structured / Unstructured • Read / Write • Push / Pull
Modeling Information Services for SOA SOAP ODBC JDBC <sale/> <value/> </ sale > <WSDL> (contract) <WSDL> (contract) <WSDL> (contract) Designing data services Exposed Data Services Reusable, Integrated Data Objects Enterprise Information Sources (EIS) Information Consumers Web Services,Business Processes services warehouses EAI, Data warehouses databases Logistics Packaged Apps spreadsheets xml Custom Apps geo-spatial Reporting, Analytics Intelligence rich media …
Data Service Abstraction Layers • Transformations from one or more sources • Transformations defined with: • Joins/unions • Criteria • Functions • Elements mapped to dictionary • Business definitions captured
Data Service Layer in SOA Client Process & Applications App App App App App App Business Process Services Business Services Message Services (ESB) Data Service Data Service Data Service Data Service Data Service Data Services Layer Data Sources
Data Services Approaches <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> <X> </X> </X> </X> </X> </X> </X> </X> </X> </X> </X> </X> </X> </X> </X> Data Services for Multiple Purposes: • Simplified access to value-added (tagged) data in real-time • Value-added (tagged) data materialized & staged • Phased-in migration from legacy to new • Managed archiving via classification, retention tags • Enhanced search via consistent content tags Agile Information Services Model-Driven Integration Layer Logical Data Model Logical Data Model T Org, Person, Image, Location T Organization, Customer, Imagery, Location Materialized Logical Model Materialized Logical Model Data, Content Sources Data, Content Sources Enriched Data/Content Store
Leveraging COI Data Dictionaries Location_ID Location_Type bldg_type bldg_id Depot_Number SITENUM Facility_ID Business Intelligence Applications Search Applications Web Services ODBC/JDBC JDBC SOAP Application views of information: • Relational, XML XML Document <a> … <b> </b> </a> T T T C2, Logistics, Intelligence, … Logical Data Model: • Agency or COI-specific • Rationalize, harmonize, mediate T T T Authoritative Sources: • Mapped to logical Multiple Internal/External Information Sources
Agenda • The drivers for data (& metadata) integration • Metadata in an SOA • Data services: using active metadata to drive data integration • Beyond metadata: dictionaries, vocabularies, domain models, ontologies (semantics) • Why ontologies? • Overview of Track 4 Presentations • Q & A
Beyond Mere Metadata • Vocabularies/lexicons, Domain Models, Taxonomies, Ontologies • All are means of beginning to define the context and scope of the domain of interest • All specify artifacts in some way • The “Semantics” word often means the relationships between artifacts is also specified
Semantics = Meaning = Relationships • Humans (and therefore our machines) only ever understand anything in so far as it is related to other things ID
Semantics = Meaning = Relationships • Humans (and therefore our machines) only ever understand anything in so far as it is related to other things VA NY ID MD
Semantics = Meaning = Relationships • Humans (and therefore our machines) only ever understand anything in so far as it is related to other things SUPEREGO EGO ID ANALYSIS
Semantics = Meaning = Relationships • Humans (and therefore our machines) only ever understand anything in so far as it is related to other things LICENSE CARD ID BADGE
Data Dictionary -> Vocabulary • The data alone does not have sufficient context • Using metadata is not enough - you must be able to leverage domain concepts and terminologies • Example problem – potentially similar data elements, but dissimilar constructs/datatypes/descriptions • How do we relate common constructs with uncommon datatypes? • Solution requires that vocabulary relate those constructs across models with transformation relationships, logic • Define business use/semantics of similar information • Datatypes describe a set of values • Defines the technical constraints on values • Enables integrating information, as datatypes can be referenced by any models (relational, XML, object, …)
Benefits of Building a Vocabulary • Develop reusable information models and schemas • Capture business and technology requirements in a single vocabulary • Capture institutional knowledge • Enables semantic mining techniques for deeper data discovery and information sharing • Accelerate interoperability, web services and SOA development and deployment • Establish and maintain a common relationship across data sources • Establish and maintain compliance with industry exchange models • Reduce IT expenses by leveraging data in its native source • Reduce IT expenses associated with building and maintaining partner integration • Improved information sharing directly enhances decision making
Example Vocabulary Development Process MDA DS COI Pilot - John Shea PEO C4I, PMW180 ISR/IO NMCI Auto Generate XSD - XML Develop UML Use-Case Class Relationship Diagram Determine Pilot Demonstration Vocabulary Handbook UNCLASSIFIED
Agenda • The drivers for data (& metadata) integration • Metadata in an SOA • Data services: using active metadata to drive data integration • Beyond metadata: dictionaries, vocabularies, domain models, ontologies (semantics) • Why ontologies? • Overview of Track 4 Presentations • Q & A
“Ideal” Semantics • Formal definition of meaning • Unambiguous • Machine process-able • Decidable • Automated classification • Membership based on properties • Inference • Can increase what you know based on classification
Ontologies • Ontology is an explicit formal specification of the terms in a domain and the relationships between them • Others are special cases • Formal conceptual model • W3C standard (OWL/RDF) implementation • Concepts, definitions, properties, relationships • Machines can draw inferences from the properties and relationships captured in the model
Ontologies • Ontologies bring rigorous definitions of meaning to (meta)data • More abstraction from lower levels of detail • Key to loose-coupling • With OWL/RDF, part of the W3C Semantic Web vision
RDF • Resource Description Format • A mechanism to make assertions about things • In the form of a triple: subject -> predicate ->object Resource (URI) -> Property (URI) -> Resource (URI or literal) • URI’s establish unique namespace; do not have to be addressable
RDF Examples “ORD” name Airport123 closestTo Business345 Airport123 “Chicago, IL” locatedIn Airport123
OWL • OWL extends RDF by allowing us to create and make assertions about classes of things has Hair Mammal is a has Retractable Claws Feline
Semantic Mapping Challenge Location_ID Location_Type bldg_type bldg_id Depot_Number SITENUM Facility_ID Business Intelligence Applications Search Applications Web Services ODBC/JDBC JDBC SOAP Application views of information: • Relational, XML XML Document <a> … <b> </b> </a> T T T C2, Logistics, Intelligence, … Logical Data Model: • Agency or COI-specific • Rationalize, harmonize, mediate T T T Authoritative Sources: • Mapped to logical Multiple Internal/External Information Sources
Contextualize (Interpret) ArticleAmount Amount Article Synonym Creation Sum Type-of Assets Automated term tokenization Automated semantic linking using the default knowledge-base contained within MatchIT
Semantic Matching (Mediate) • With relationships pre-established within the knowledge-base… • Identify the Target and the Source(s) and run the match. ArticleAmount Automatically linked by a specific % distance ProductShares
Facilitate Decision Making (Mediate) Target element for matching Automatically calculated semantic distance between terms Helps facilitate rapid decision making Source candidate for matching
Integration Driven By Semantics Ontology Models (e.g. OWL, RDF) XML XML XML Relate information in different domains/models Search within and across domains for related information Enterprise Model (UML) Model & Relate information within any domain Data Models (Relational, XML) Physical Sources
Ontology-Driven Integration Example equivalence equivalence equivalence equivalence Logical Views Ontology Physical Sources Transportation T Land T 4 Wheel 2 Wheel T Bus Truck Car T Cargo Truck Fuel Truck
Agenda • The drivers for data (& metadata) integration • Metadata in an SOA • Data services: using active metadata to drive data integration • Beyond metadata: dictionaries, vocabularies, domain models, ontologies (semantics) • Why ontologies? • Overview of Track 4 Presentations • Q & A
Track 4 Talks Tomorrow: 2:45-4:15pm • Predictive Metrics To Guide SOA-Based System Development • John Salasin, NIST • Integrating SOA and Ontologies for Information Sharing • Atif Kureishy, BAH • SOA & Semantics • Dave McComb, Semantic Arts
Predictive Metrics To Guide SOA Development John Salasin, NIST • Will propose a set of metrics (vocabulary) to characterize SOA-based systems • These metrics can be assessed at different points in the development lifecycle • Early stage (concept development) • Architecture/Construction (system charac.) • Operations (robustness, perf, usage, govern.) • Evolution (extensibility, change mgmnt) • Analysis can lead to ongoing refinement at every stage • Quantitative, incremental Verification &Validation