880 likes | 1.07k Views
Grand Ontology Strategy. Barry Smith http://ontology.buffalo.edu/smith. DSC Cloud. part of Army’s Distributed Common Ground System ( DCGS-A) Semantic Enhancement Strategy Make data retrievable, and support analytics, by using Ontology Modules to tag data . Agenda Day 1.
E N D
Grand Ontology Strategy Barry Smith http://ontology.buffalo.edu/smith
DSC Cloud part of Army’s Distributed Common Ground System (DCGS-A) Semantic Enhancement Strategy Make data retrievable, and support analytics, by using Ontology Modules to tag data
Agenda Day 1 • 9:00 Grand Ontology Strategy • 10.45 Military intelligence: an overview of the domain from a data integration perspective • 12:15 Lunch • 13:00 Interactive session: building sample ontologies • Human Physical Property Ontology • Geospatial Ontology • Target Ontology • Event Ontology • Video Ontology • INTEL Product Ontology
Agenda Day 2 • 9am Survey of some existing approaches to ontology-based military/intelligence informatics • 10.45 Break • 11:00Distinction of Relations between Ontologies and • Data Models • 11:30 A strategy to ensure consistency of ontology development across multiple domains • Rules for coordination of ontology development • Creating a system of orthogonals • Potential partners: establishing a division of labor • Establishing the scope of a suite of Interoperable Military Intelligence Ontologies • The role of Joint Doctrine
External Participants Albert Baker: Army Emerging Web Technologies (semantic solution to global force data management) Bill Barnhill: Army Data Management Implementation for Army SEC, in support of Army CIO/G-6 Dan Carey: OUSD Personnel & Readiness Information Management: HRM Domain Ontology development Cliff Joslyn DoE Kevin GuptonNaval Sea Systems Command (NAVSEA), Modeling & Simulation Information Management Richard Lee: Digital Integrated Air Defense System /DSB (METS PMO) Peter Morosoff, Electronic Mapping Systems, Inc. Military doctrine SME
The roots of Semantic Technology • Make your data available in a standard way on the Web • 2.Use controlled vocabularies (‘ontologies’) to capture common meanings, in ways understandable to both humans and computers – Web Ontology Language (OWL) • Build links among the datasets to create a ‘web of data’
Controlled vocabularies for tagging (‘annotating’) data • Hardware changes rapidly • Organizations rapidly forming and disbanding • Data is exploding But • Meanings of common words change slowly • Use web architecture to annotate exploding data stores using ontologies to capture these common meanings in a stable way • Separate enhanced data from software
The hope underlying the strategy of Semantic Enhancement Build ontologies using formal languages (OWL, or something better) to enhance the different bodies of exploding data in a consistent fashion that would enable these data to be more easily • retrievable • integratable • analyzed • reasoned over
The problem: the more Semantic Technology is successful, the more it fails The original idea was to break down silos via common controlled vocabularies for the tagging of data The very success of this idea leads to the creation of ever new controlled vocabularies – semantic silos – as ever more ontologies are created in ad hoc ways The Semantic Web framework as currently conceived and governed by the W3C yields minimal standardization Multiplying (Meta)data registries and ontology repositors are creating semantic cemeteries, where data goes home to die
Some of the reasons for this effect • Low incentives for reuse of existing ontologies • Each organization wants its own ontology (“We have been describing our data in this way for 30 years, we are not going to change now”) • Poor licensing regime, poor standards, poor training
Why should you care? • when there are many ad hoc systems, average quality will be low • constant need for ad hoc repair through manual effort • DoD alone spends $6 billion per annum on this problem • regulatory agencies are recognizing the need for common controlled vocabularies
Some people think that the problem of multiplying ontologies can be solved by mappings between ontologies Annotation = tagging data with ontologies Mapping between ontologies = like a warehouse with multiple inventories = a waste of resources with thanks to Ron Rudnicki, IARPA AIRS (Actionable Information Retrieval System ) project
Ontologies are not Sufficient for Interoperable Data Count of Articles per Year Returned by Search on Google Scholar 3500 140 3000 120 2500 100 2000 80 1500 60 1000 40 500 20 2001 1997 1998 1999 2000 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 “Ontology” in Article Title “Ontology Mapping” in Article Title CUBRC - Proprietary
Some people think we can rely on luck: The Infinite Monkey (FortuitousInteroperability) Strategy
A better solution Find out what makes ontologies stable and useful, and create an evolutionary process whereby good ontologies will thrive and bad ontologies will die Being a good ontology means not only: being good, and it also means being aggressively used in annotations
How to do it right? • how create an incremental, evolutionary process, where what is good survives ? • how to bring about ontology death ? A success story from biology
To find out out what makes ontologies stable and useful look at the world’s most successful ontology = the Gene Ontology (GO) What makes GO successful? • built by SMEs (constant feed back loop) • coherent architecture • improves over time through application of best practices learned through use and simple feedback from users to developers
New biology data MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
to this? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
answer: through annotation of data with terms from the GO controlled vocabulary sphingolipid transporter activity Holliday junction helicase complex
Why is GO successful • built by bench biologists • multi-species, multi-disciplinary, open source • compare use of kilograms, meters, seconds in formulating experimental results • natural language and logical definitions for all terms • initially low-tech to ensure aggressive use and testing
If controlled vocabularies are to serve to remove silos they have to be respected by many owners of data as resources that ensure accurate description of their data – GO maintained not by computer scientists but by biologists they have to be willingly used in annotations by many owners of data they have to be maintained by persons who are trained in common principles of ontology maintenance
Success of GO measured by the fact that • it has created a community consensus • it has implemented a web of feedback loops where users of the GO can easily report errors and gaps • it has identified and applied principles for successful ontology management
GO is limited in its scope it covers only generic biological entities of three sorts: • cellular components • molecular functions • biological processes no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …
Thus it was necessary to extend the GO methodology to other domains of biology and medicineAnd to provide and test rules for such extension
OBO (Open Biomedical Ontology) Foundry proposal (Gene Ontology in yellow)
How to recreate the success of the GO in other areas • create a portal for sharing of information about existing controlled vocabularies, needs and institutions operating in a given area • create a library of ontologies in this area • create a consortium of developers of these ontologies who agree to pool their efforts to create a single set of non-overlapping ontology modules • one ontology for each sub-area
NextGen Ontology Portal Ontology Portal Two-Tiered Registry NextGen Ontology – consist of vetted ontologies Ontology Library – open to the wider community Ontology Metadata Ontology owner, domain, and location Ontology Search* Support ontology discovery
Redundant efforts defeat the purposes of ontology-based data integration The ontologies are of lower qualityMultiple ontologies leads to siloing of dataMultiple ontologies block user commitment
The General Strategy • (Applied to Army intelligence, but with a view to generalizing to other military agencies) • Identify all Army intelligence ontology projects • Lock leaders of these projects in a room, with government personnel • Pool information • Thrash out a strategy for creating a single non-redundant suite of interoperable ontologies, with a division of labor, a division of responsibility
Some rules to serve as starting point for negotations in the room • Developers commit to collaboratingwith developers of ontologies in adjacent domains and • Developers commit to ensuring that, for each domain, there is convergence on a single ontology See http://obofoundry.org
Fixing Intel Integrate information collected by: Civil Affairs Officers PRTs Atmospherics Teams Afghan Liaison Officers Female Engagement Teams Non-Governmental Organizations Development Organizations United Nations Officials Psychological Operations Teams Human Terrain Teams Infantry Battalions… www.militaryontology.com
Basic principles of ontology development • for formulating definitions • of modularity • of user feedback for error correction and gap identification • for ensuring compatibility between modules • for using ontologies to annotate legacy data • for using ontologies to create new data • for developing user-specific views
Principle of two-part definitions • Each ontology term A will have a unique immediate parent B within the asserted hierarchy • The definition of A will state what it is about certain Bs which makes them As. Thus it will be of the form: A =def. a B which Cs
Types of SE Ontologies • Upper-level = BFO, plus small extensions of BFO covering terms used in almost all lower-level ontologies such as person, group, datum, meeting, … • Low-level ontologies = small ontologies for single domains, for example: • ontologies of qualities such as hair color, eye color (close to flat lists) • ontology of INTEL disciplines • ontology of INTEL products
Types of SE Ontologies 3. Mid-Level Ontologies of two sorts • reference ontologies, created through downward population from BFO, to cover broad domains comprehending multiple lower-level ontologies, for example: geospatial information artifact military operation • application ontologies created for specific purposes by merging components from other ontologies, or by introducing data-source specific terms
The Semantic Enhancement Approach • Create a small set of plug-and-play ontologies as stable monohierarchies with a high likelihood of being reused • Create ontologies incrementally • Reuse existing ontology resources • Use these ontologies incrementally in annotating heterogeneous data • Annotating = arms length approach; the data and data-models themselves remain as they are
The Semantic Enhancement Approach • Annotations can be associated with metadata concerning provenance (GO Evidence Codes) • Annotations in common ontologies allows data to be shared across different communities • The common architecture and logical structure of the ontologies brings benefits in • querying • search • analytics • reasoning
Benefits of Modularity • Brings a clean division of labor amongst domain experts, who can manage governance aspects pertaining to their own domains • Automatic consistency of the results of the distributed efforts – no room for contradiction • Additivityof annotations even when multiple independently developed ontologies are used • Lessons learned in developing and using one module can be used by the developers and users of later modules
Benefits of Modularity • Increased likelihood of reuse, since potential users will be aware that they are investing in the results of an authoritative coordinated approach of proven reliability • Increased value and portability of training in any given module • Incentivizationof those responsible for individual modules
Benefits of Modularity • All of those involved can more easily inspect and criticize the results of others’ work • Creates a collaborative environment for ontology development serves as a platform for innovations which can be easily propagated throughout the whole system • Developing and using ontologies in a consistent fashion brings a number of network effects – the value of existing annotations increases as new annotations are added
Dealing with vocabulary conflicts across COIs The goal is: one agreed, authoritative representation for each domain To achieve agreement we need: • coordinating board, change management • border treaty negotiations • community-specific views of the terminology (using exact synonyms)
Governance • Common governance (coordinating editors, change board) • Common training • Robust versioning • Common architecture • Strategy of downward population • How much can we embed governance into software?
Logical standards can be only part of the solution OWL … bring benefits primarily on the side of syntax (language) What we need are standards on the semantics (content) side (via top-level ontologies), including standards for • top-level ontologies • common relations (part_of …) • relation of lower-level ontologies to each other and to the higher levels
BFO, DOLCE, SUMO All exist in FOL and OWL versions All have been tested in use BFO: very small, truly domain-neutral DOLCE: largely extends BFO, but built to support ‘linguistic and cognitive engineering’ SUMO: has its own tiny mathematics, tiny physics, tiny biology (‘body-covering’, ‘fruit-Or-vegetable’), … A special case: Cyc: Allows inconsistent microtheories (so: chaos) – has received a lot of funding, but does not perform well in use