E N D
11:00 Self-Introductions11:15 Report on ontology-based data integration work in DCGS-A--- Goals and methodology--- Practical experience and results so far--- Risks and problems12:15 Lunch (Brown Bag Lunch?)13:00 Discussion of NSA work and goals (to be expanded)--- ontology alignment / management--- OMaaS (Object Management as a Service)14:00-15:30 Discussion of how collaborative ontology effort (e.g. betweenI2WD and NSA) will work in practice--- how to ensure consistency (architectural and content)--- how to address governance --- business development and work flow
DSC Ontology Work Goals and methodology Practical experience and results so far Challenges 02/05/13
Goal: To realize Horizontal Integration(HI) of intelligence data HI =Def. the ability to exploit multiple data sources as if they are one • Problem: the data coming onstream are out of our control • Any strategy for HI must be agilein the sense that it can be quickly extended to new zones of emerging data according to need • Ontology can provide the needed agility and (incremental approach to) comprehensiveness
The Business Case • Huge resources are wasted as multiple different agencies create lexicons, glossaries, data models, messaging and exchange standards with the same or closely overlapping coverage. • Additional resources are wasted in creating mappings between these artifacts, and in maintaining them in light of new needs and challenges. These mappings always fail. • A sensible solution must incorporate an evolutionary process which will ensure that artifacts used to manage data converge over time.
Methodology: Create an agile strategy for building ontologies within a Shared Semantic Resource (SSR) and apply and extend these ontologies to annotate new source data as they come onstream • Strategy pioneered in biomedical and other scientific fields: leaves data as they are, and incrementally tags data sources with terms from a growing, consistent, non-redundant set of ontologies • Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups • How to manage collaboration? This is what this meeting is about
Current State 2010-2011 • Architectural implementation (DRIF) to create the Dataspace (a cloud of intelligence data) = lossless representation of sources with their native semantics • Initiated Semantic Enhancement (SE) = using ontologies to annotate these native semantics • Demonstration of use of SE to index and query the content of the Dataspace 2012 • Created methodology and architecture for ontology development • Initiated Shared Semantic Resource (SSR); created suite of prototype ontologies enabling SE also outside the Dataspace • CUBRC: demonstrated ability to leverage SE for analytics • Priming potential users of SE (in DoD CIO, NGA, JIEDDO, TRADOC …) 2013- • Milportal • Event Reporting Application • Initial negotiations with other agencies and groups
Ontologies We Have (Feb. 2013) • Physical Artifact, including Infrastructure, Facility, Vehicle, Weapon • Information Artifact, including Report, Image, Map • Event, including Military, Criminal, Economic, Political, Religious, Social • Human Physical Characteristics • Agent, including Person, Organization, Social Network • Geospatial • Time
MilPortal http://milportal.ncor.buffalo.edu
Next step slide from Margaret Storey
Work plans 1: For DSC Cloud (on-going) • Perform needs analysis: Review DCGS-A Logical Data Model and schemas of other DSCG-A data sets; in each case, examine content and establish what terms are needed to ensure sufficient coverage for SE; • Where these terms already exist within the SSR, check that definitions exist and that these definitions are adequate • Where the terms do not exist within the SSR, create new terms (or new ontologies), with appropriate definitions as necessary to fill gaps. • Use the terms in 2. and 3. to annotate the corresponding entries in the data models to effect horizontal integration; • With each new expansion in scope of DSGS-A data sets, iterate the above as needed. • In addition, we are engaging in documentation of the methodology as here described, and in dissemination and training.
Work plans 2: For interagency collaboration Step 1: Initiating interagency collaboration in the service of horizontal integration of intelligence data Identify candidate teams / agencies Establish collaboration with one or more specific teams • Formulate and ratify agreements as appropriate • Create work plan and identify funding needs • Perform risk assessment Step 2: Establishment of the inter-agency ontology development process Examples of types of work to be performed would include: • Create governance infrastructure • Establish needed technological support • Implement workflow • Conduct training in methodology where needed • Explore opportunities for inter-agency HI of data • Begin application to relevant agency data models of the SE strategy • Dissemination of results in a form which will allow improved systems to perform enhanced analytics exploiting semantic interoperability.
The SSR methodology and governance is neutral as to how SSR-annotations are used • Currently SE of DSC enhancement only integration through improved indexing/search capability • On CUBRC project, we have much more, including ontology-based reasoning. • In the future we will have for DSC enhancement applied • to multiple models such as LDM • to unstructured text • see the various methodology documents provided so far
Event Reporting Application (EvR) • Aggregates content from various report generating applications (CPOF, TIGR, TIGR with MAPHT extension) • The underlying data model contains nearly 500 terms (e.g. ReportName, EventName, DeclassificationEvent) • The semantics of the data model seems similar to that of a relational model
EvR Data Model as Relational Database • Hierarchies of events and units are referenced by EventType and UnitEchelon columns, but this alone provides no capability for traversing these hierarchies • A report is related to both a report name and a report unit by the inclusion of appropriately named columns, but what is the difference between these relationships? • Our approach shows how to express the difference, e.g. between report-to-event and event-to unit relationships
Enhancing the vocabulary of the EvR Data Model • Semantic enhancement (SE) amounts to associating a database field to a whole knowledge system enabling machines to process data … • “vertically” e.g. by traversing the echelon command hierarchy and • “horizontally” e.g. by following specified relations between units and events. • SE separates semantics from structure, reducing maintenance costs as source databases no longer have to be modified each time we improve our understanding of reality • The ontology tags move with the data …
Enhancing the vocabulary of the EvR Data Model Progress to Date • Two techniques of SE are available • Partial enhancement (now being used in the DSC) associating ontology label terms with EvR terms • Full enhancement (not yet implemented in the DSC) aligning terms from the EvR to assertions using terms and relations from SSR ontologies • At present the current ontologies… • provide 70% coverage for partial enhancement • provide 38% coverage for full enhancement • plan for extending the ontologies to raise the coverage for the EvR to 90% and 60% respectively by end of February
Enhancing the vocabulary of the EvR Data Model Event Reporting Term Full Enhancement via Ontology Assertion(s) Partial Enhancement via Ontology Label
Enhanced EvR Data as a Graph Event ID Event Type Unit Identification Code Report ID is_subclass_of designates part_of designates designates Event describes Report participates_in Unit has_output designates controls designates Act of Reporting Report Name Command Unit affiliated_with Unit Name agent_in Report Unit Service shows how full annotation provides the semantics missing from the EvR relational model described above
Event ID Event Type Unit Identification Code Report ID is_subclass_of designates part_of designates designates Event describes Report participates_in Unit has_output designates controls designates Act of Reporting Report Name Command Unit affiliated_with Unit Name agent_in Report Unit Service • relationships between • event type and echelon • report name and reporting unit • are made distinct and machine processable
Challenges • Challenges to Horizontal Integration in general • Too many lexicons • The scope of the domain: signal, sensor, image, … intelligence about the whole world • Our solution • Incremental extraction and sanitization, and creation of content • Distributed collaborative development • Strong methodology • Challenges for our solution • Governance and management of ontology development to ensure consistent evolution • Lack of expertise • For ontology development and management • For annotation • Success will breed failure
We have these precursor ontologies • The prototype ontologies in the existing SE (Malyuta-Salmen) helped indexing • The ontologies we now have are much better and more than these prototypes • How should we implement them re Event Reporting Ontology, global graph, etc. …? • Next steps with I2WD
Our ideas are being heard • 2 classes of collaborator: observers and partners • companies such as DataTactics are realizing that in case of success this provides huge potential business benefits. • (DT invited our team to talk at the Ontology Summit they have called on Feb. 12, devoted to the development of shared semantics)
Peter Morosoff and Bill Mandrick • How to create a strategy?