450 likes | 475 Views
Explore the challenges of semantic silos in enterprise data integration and learn how to balance vocabularies for efficient management. Discover the roadmap to reach the "Semantic Sweet Spot" and optimize your data architecture with a Semantic Wiki. Address the consequences of vertical vs. horizontal thinking, and understand the importance of vocabulary management in overcoming semantic barriers. Join us to delve into the world of eclectic intelligence and enhance your data exploitation strategies.
E N D
Vocabulary Management Eclectic Intelligence ExploitationBalancing Vocabularies for Enterprise Data Integration Stalking the Semantic Sweet Spot Kevin S. Lynch Ontologist CIA Marguerite Ardito Principal Information Architect Information Exchange, Inc April 18, 2007
Agenda • Part 1: The Problem • Vertical vs. horizontal orientation • Semantic silos • Vastness of the vocabulary problem • Part 2: Solution Roadmap • Enterprise Data Architecture • Zeroing in on one “Semantic Sweet Spot” • Semantic Wiki for vocabulary management • Progress and plans
The Problem Getting to the Root of the matter
Gnarly Roots Gnarly Information • Too much or too little, irrelevant, not authoritative, out of date • Format is unusable, not trustable, no lineage, no certainty measures • Security rules are implemented inconsistently Gnarly Systems • Rigid & ossified. Maintenance eats up more resources than it would take to change things • Repetitious, boutique solutions to common problems • Multiple applications of NLP and entity extraction aiming at common concepts • “Cheap” storage solutions are not cheap
"We cannot solve today's problems using the mindset that created them." 3. Consequences of 1 and 2 are everywhere and entrenched under millions of lines of code Albert Einstein How did we get here? Root Cause and Effect 1. Wrong orientation & approach (Vertical vs. horizontal mindset) 2.Inconsistent, unmanaged silo-ed semantics
The Vertical Mindset • Humans (and cultures and organizations and termites) are Vertical by nature • Vertical thinking is territorial thinking • My project, Myobjectives, Myrewards, Mysphere of control, My information • Sharing and Reusing Information violates the “WIFM” Principal • Agencies (even congress) have ingrained vertical Project-Centric Culture • “People build stovepipes because that’s the way the $$ comes.” Vertical thinking limits perspective. Sally Robinson, 1986
Vertical Vs. Horizontal Mismatch • Information is Horizontal by nature • Same fact or information can be employed, viewed and reused multiple ways • Many different ways of combining facts to reach new conclusions • Information spans projects, beyond scope and control of one project Horizontal thinking expands perspective.
Vertical thinking steals time from mission • Mission Objectives are “enterprise level” • Vertical thinking creates: Dissonance, Discord, Dissension, Dis-Integration and Disharmony • Optimizing locally is sub-optimal for the enterprise and the mission • Creates a cultural barrier against information sharing and reuse • This presentation mostly addresses technical barriers, not cultural Horizontal thinking optimizes the mission
The Real World (Problem Space) Semantic Silo(Solution Space) Semantic Silos are a consequence of Vertical Thinking Person? Event? Organization? Location? Intention? Concepts and thought patterns Business usage, queries, reports, policy, rules Application terminology, rules, and logic Data storage, schema, Tagging, index, relationships • Semantics are embedded in IT artifacts (often due to technology or engineering limitations) • Semantic Silos become entrenched in the culture
Semantic Silos Proliferate Silo 2 (Org 2 App 2) Silo 1 (Org 1 App 1) • Each new solution develops its own semantics • Users can’t communicate across organizations and projects • Data is disparate, tightly coupled to application code – can’t be shared or reused • Consequent communication barriers disrupt the Mission • ^ • > • < V • & • # • * @ * ?? ??
Semantic Silos Steal Time from Mission Person? Event? Organization? Location? Intention? • Terminology and resulting disruption becomes self-perpetuating and fossilized over time due to indecipherable and inextricable dependencies • Semantic silos isolate analysts and other users from the problems they are trying to solve as well as from each other • Silos persist even when technology limitations are removed Real World Disparate Vocabularies
The Results of Semantic Silos • A self-perpetuating Tower of Babble(even within an enterprise)
Revisiting the Root Cause • Semantics (business meaning) is embedded in our programming code – rather than an explicit, machine processable knowledge representation • Users are dominated by entrenched, application-specific transactions – rather than human-level concepts People, Places, Events • To reverse this we would have to change orientation and invest as much $ and effort in information as in the rest of IT (which is about as likely as changing human nature.) The Antidote to Semantic Silos is an Enterprise Data Layer
Vocabulary Management is Step 1 • Promise of Vocabulary Management • Decipher hidden meanings • Untangle hidden dependencies • Enable explicit unambiguous representation • Enable machine processing of knowledge representation • Ultimately decouple dependencies • Stop the proliferation of Semantic Silos Vocabulary Management is the foundation of the Enterprise Data Layer
Vocabulary Management • Eureka! • Finally recognized as vital task • We have identified and begun work with a semantic wiki to manage our vocabulary • So…. Let’ get started…. This should be simple, right?
Core Vocabulary Vocabulary Management – a Simple Data Gathering Exercise? Glossaries Data Dictionary Enterprise Conceptual and Logical Data Models Type Definitions File Plans Formal Taxonomies and Classification Schemes Project Databases & Models
Core Vocabulary Access Control Interoperability The Elephant in the Room(in fact, it’s a herd) Policy & Business Rules Hidden code impact Hidden Mission Implications Undocumented Usage Context Sensitivity Semantic Conflict Working Definitions (e.g. Accounting Aggregation)
It’s not just one vocabulary “Ontology and the Semantic Mapping Problem” “… this is an issue that affects everything in information technology that must confront semantics problems – the problem of representing meaning for systems, applications, databases, and document collections. You must always consider mappings between (your target semantics) and … your base representation … if it’s not formal, it’s probably hard-coded in procedural code and that means it’s really a problem.” The Semantic Web, Daconta, Obrst, Smith
Core Vocabulary It’s not just about gathering Business Semantics, Mappings, Process Improvement, Semantics for future adaptive systemsRedefinition Clarification Alignment/Mapping to IT Business Rule Restatement Analysis Synthesis Integration
Vocabulary Management Semantics are everywhere! • We can’t tackle the whole elephant • We can’t manage all enterprise vocabulary • Managing enterprise semantics requires horizontal thinking in a vertical world. • Need to isolate the highest value domain for semantic clarity that can lead to an implementable solution with near term ROI Need to find the Semantic Sweet Spot
Part 2: Solution Roadmap - or- How the Enterprise Data Architecture helps get a grip on enterprise semantics
Taking on complexity • Think Big: Envision the solution • Plan: Blueprint & Roadmap • Start: (in a focused area) • Envisioning the solution • Point A: current state • Point B: desired end state Think Big Focus In
Point A: Consequence of Vertical thinking An Endless Cycle of Local stovepipe process & Single-Purpose Data Stovepipe process happens again …and again … and yet again….
Point B: Desired End State Enterprise Data Layerenables quality information on demand in the form needed Information Withdrawal Data Deposit Enterprise Data Layer
Enterprise Data Layer enables a Common Intelligence Landscape • A stable frame of reference for our data assets that describes the business of the enterprise • Transforms raw data into information and information into intelligence to accelerate and improve decision making • Strengthens our ability to query based on specific facts in addition to our ongoing efforts to improve document-based text searching • Resolves to a single source of truth for reference data and critical master objects “So that’s where we want to go… the tricky part is how do we get there.” General Hayden
Enterprise Data Architecture • EDA is the EDL Blueprint • Articulate the Vision • Clearly define required components • EDA is the EDL Roadmap • Layout a plan for change • Focus and prioritize activities
EDA Components All work together – No component works independently
Benefits of EDA components Today’s Focus
Enterprise Data Model Provides Foundation • Conceptual Model Ensures Semantic alignment • Explicit agreement on relationships and meaning • Conceptual – very abstract • Enterprise Logical Models are more concrete views to align with projects • Together they form the foundation for data quality, sharing, interoperability and trust
Master Data Management provides Focus • “Identify most significant, widely used “core objects” and manage continuously at enterprise level” • Person, Organization • Place • Thing (Digital or Physical Object) • Event • Concept (Objective, Topic) • Consider cost of error – some things you just can’t afford to get wrong • Mars lander Chinese Embassy • FOCUS and Prioritize –we can’t control it all • 20% of data holds 80% of Value • Assign Permanent Unique ID (GUIDE) • Rationalize and resolve • Defined governance, accountability, traceability
Conceptual ECDM Drill down for Master Entities Org Person Resource Location Physical Master Data Management spans disciplines • Managed Models: • Enterprise Conceptual Data Model lays out basic semantics and relationships • Mappings provide linkage to source and usage • Managed Data: • Managed and integrated at detailed instance level across the life cycle • Master Data is where data, context, schema, meaning, usage, rules, lineage, certainty and versioning all come together
Content Integration Increases Automation • Increased structure • Integrate extracted information as enterprise assets • Increase level of automation • Reduce repetitive use of unstructured content • Reduce reliance on Search • Improved quality and integrity • Increase referential integrity • No uncontrolled redundancy • Single logical instance of master data • Resolve instances, correlate occurrences with master and related data • Enrich entities with metadata for lineage, certainty and change (versioning)
Content Management/ Entity Integration Strategy • New ideas to exploit unstructured content • Master Entity Integration Pipeline • Smart (Semantically-enabled) Master Entity Hub • Visibility and control of Master Entities • Across the enterprise • Across the life cycle • “Master Steward” • Powerful new roles combines strategic, tactical and operational Master Data Governance • Roles span traditional boundaries and demand powerful tools
Master Entity Integration Pipeline Enterprise Data Svcs Weakly Structured Content Management Abstraction Layer Enterprise Boundary • Web site • Bulk Text • Email • XML wrapped content • XML Tagged Content • Schema-ctl XML • Formatted File • DB Xfer • Transaction Entity Identification and Information Extraction Data Cleansing and Entity Resolution Smart Master Hub • Master Dashboard • Entity Assembly Mgmt (structure, cleanse, merge, split, dedupe, link) • Certainty/Lineage Mgmt Master Hub • Person • Org • Location • Event • Thing Data Management Enterprise Repository Strongly Structured Enterprise Data Layer External Local/Project Enterprise/Shared
Vocabulary Semantic Management Hub Semantic Master Entity Hub is one Semantic Sweet Spot • Empowers the roles of Master Entity stewards • Demands total end-to-end, top-to-bottom semantic consistency • Breaks down walls and blurs boundaries • Class and instance (semantic models and data base sources) • Different Knowledge representations • Span phases and disciplines: • Business analysis, • Design and development • Runtime operations • Governance, policy and rules about the data
Semantic Technology • “Semantic Technologies are technologies that enable the meaning of data, process and service to be explicitly represented and machine processable.” • Two different interpretations of this. • Vertical Project View: A separate ontology representation • Stand-alone ontology for query and reasoning • Has value - but creates another stovepipe • Horizontal Enterprise View: Foundation for Semantically Aware Architecture • Smarter architecture and increased automation • Explicit, machine processable, interoperable semantics • Smarter data • More consistent, trustable information throughout the enterprise • Data that carries its own lineage, certainty factors, and interpretation • Smarter behavior • Increased automation by semantically aware applications • Vocabulary alignment enables more consistent, more rule-, data- and event-driven software throughout the architecture
Knowledge Representation Stack Knowledge Representation Security, Trust Each layer builds on the layer below to encode meaning in an explicit and processable way (i.e. create a model that computers can use). Reasoning/Rules/Proof InformationInteroperability InformationIntegration/ Integrity Semantically Aware Architecture Machine Interpretation of Semantics Description Logic Common Logic OWL Exposed Models, Views,Services, Semantics RDF RDF Schema EII Model Mapping Relational Schema, SQL XML Schema Explicit Structure APIs, EDI, Interchange Fmts XML Syntax, Transmission Current Architecture “Digital Dial Tone” Global Addressing, GUIDE HTTP, URI, Unicode Adapted from Berners-Lee, Hayes, Lynch
Progress and Plans • Vocabulary Management is first step to any semantic technology solution • Resolve terminology conflicts • Define relationships • Collaborate across varied stakeholders • Revelytix Knoodl Semantic Wiki • Approach: • Select and bound pilot domain • Enter base terms and relationships • Enter Enterprise Data Model including data dictionary • Engage community in vocabulary review • Validate Vocabulary • Define governance rules based on defined terms
Generated OWL <?xml version="1.0" encoding="UTF-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" xmlns="http://www.EDA.com/2007/Party#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:daml="http://www.daml.org/2001/03/daml+oil#" xml:base="http://www.EDA.com/2007/Party"> <owl:Ontology rdf:about=""/> <owl:Class rdf:ID="Corporation"> <rdfs:subClassOf> <owl:Class rdf:ID="Organization"/> </rdfs:subClassOf> </owl:Class> <owl:Class rdf:ID="Person"> <rdfs:subClassOf> <owl:Class rdf:ID="Individual"/> </rdfs:subClassOf> </owl:Class> <owl:Class rdf:about="#Organization"> <rdfs:subClassOf> <owl:Class rdf:ID="Party"/> </rdfs:subClassOf> </owl:Class> <owl:Class rdf:ID="Personna"> <rdfs:subClassOf> <owl:Class rdf:about="#Individual"/> </rdfs:subClassOf> </owl:Class> <owl:Class rdf:ID="Government_Agency"> <rdfs:subClassOf rdf:resource="#Organization"/> </owl:Class> <owl:Class rdf:ID="Country_Government"> <rdfs:subClassOf rdf:resource="#Organization"/> </owl:Class> <owl:Class rdf:about="#Individual"> <rdfs:subClassOf rdf:resource="#Party"/> </owl:Class> </rdf:RDF> Once the OWL is captured, let’s put it to work. How many ways can we exploit and leverage results?
Semantic Wiki • Upside • Collaborative • managed by community • shared evolution and dissemination • Enables Review Cycle • Standards-based - Generates OWL • Cross-links to documentation • Rigorous definitions support Business Rules • Can be populated with Master Data for querying • Foundation for strategic applications • Lays the foundation for Ontology-based Information Management • Results are reusable for many purposes • Mechanics are simple • Downside • Balancing semantics is HARD WORK- No matter how friendly the tools • Need more visualization and reporting • Needs active integration with other components
Next Steps • Formalize requirements for vocabulary repository • Complete initial Vocabulary Definition • Complete Enterprise Model • Populate with types and static taxonomy data • Develop a set of competency questions • Validate Vocabulary • Formalize mappings to data sources • Validate approach • Assess knoodl adequacy as primary vocabulary repository • Investigate alternatives or augmentations • Determine means of keeping data models, rules and glossaries synchronized with vocabulary • Investigate and implement ways to exploit the OWL
Kevin S. Lynch Ontologist CIA kevinsl@ucia.gov Marguerite Ardito Principal Information Architect Information Exchange, Inc marguea@ucia.gov marguerite@iexco.com