480 likes | 607 Views
Benchmarking for Data Stewardship in Asset Management. Brand L. Niemann (US EPA), Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee (BPC), Federal CIO Council Transportation Research Board, 86th Annual Meeting
E N D
Benchmarking for Data Stewardship in Asset Management Brand L. Niemann (US EPA), Co-Chair, Semantic Interoperability Community of Practice (SICoP) Best Practices Committee (BPC), Federal CIO Council Transportation Research Board, 86th Annual Meeting Workshop, Sunday, January 21, 2007, 1:30 p.m.-5:00 p.m., Hilton Hotel, Washington, DC
Workshop Abstract • Jack R. Stickel, Alaska Department of Transportation and Public Facilities, presiding • Sponsored by Statewide Transportation Data and Information Systems Committee; Information Systems and Technology Committee; and Transportation Asset Management Committee • Effective management of enterprise data is the cornerstone for any optimized decision making related to asset management. During the 2006 TRB Annual Meeting, the workshop Data Stewardship: Meeting the Challenges of Enterprise Data Management demonstrated the technical elements of data stewardship and helped to identify the major challenges. The current workshop will build on that work and focus on the best practices in managing data stewardship. The workshop includes benchmarking best practices from other industries and a synthesis of initiatives and findings from other industries related to data stewardship. Relating the lessons learned, pitfalls, and best practices that can be implemented into the transportation domain will be explored.
Introduction • Email invitation to Workshop did not contain a URL: • Used Google for TRB and found the top two were: • Transportation Research Board Homepage • http://www.trb.org/ • TRB 86th Annual Meeting • http://www.trb.org/meeting/ • Searched at TRB 86th Annual Meeting for this workshop and found the top two in Google were: • http://www.trb.org/am/ip/assembly_detail.asp?id=7920&e=186 • http://www.trb.org/meeting/workshop.pdf So far I am doing great thanks to Google!
Aviation Bituminous Materials Concrete Materials Construction Data and Information Systems Design Environment and Energy Environmental Regulation Ferries Freight Systems Geology and Earth Materials International Activities Legal Resources Maintenance Management and Leadership Marine Operations Pavement Management Pedestrians and Cycles Public Transportation and Ferries Rail Research and Education Roadway Pavement Preservation Safety Security Social, Economic and Cultural Issues Soil Mechanics Structures Systems Planning, Policy, and Process Taxation and Finance Transportation Institutions, Finance and Workforce-Meeting the Needs of the 21st Century (spotlight) Transportation Policy Travel Analysis Methods Trucking Users Introduction Subject Areas: TRB Annual Meeting Interactive Program, http://www.trb.org/am/ip/
Introduction • Looked at TRB Publications: • http://www.trb.org/TRB/publications/Publications.asp • Found TRB Information Databases at the bottom of the Web page: • Transportation Research Information Services (TRIS) • http://tris.trb.org/about/ • TRIS Online (free version): • http://ntl.bts.gov/tris
Introduction • TRB Publications: • Search more than 1,500 TRB full-text electronic publications in nine general subject categories: planning, administration, and environment; design; materials, construction, and maintenance; operations and safety; aviation; public transit; rail; freight transportation (multimodal); and marine transportation.
Introduction • Transportation Research Information Services (TRIS): • The world's largest and most comprehensive bibliographic resource on transportation information. • Contains over 600,000 records of published and ongoing research. • Since 2000, most of the TRIS database has been available free on the Web as TRIS Online through the Bureau of Transportation Statistics' National Transportation Library's Web. • TRIS is indexed with a standardized vocabulary from the Transportation Research Thesaurus (TRT).
Introduction • The Transportation Research Thesaurus (TRT): • The Transportation Research Thesaurus (TRT) was initially developed under NCHRP Project 20-32 to provide a tool to improve the indexing and retrieval of transportation information. The thesaurus covers all modes and aspects of transportation. A primary use will be to provide a common and consistent language between producers and users of the Transportation Research Information Services (TRIS) Database. The thesaurus is currently also used as an indexing tool for federal, state and university collections. • The TRT Web site allows users to access terminology through Alphabetical, Hierarchical, Keyword In Context or Keyword Out of Context displays. The full display shows relationship of terms including SN (Scope Notes), UF (used for terms), BT (Broader Term), NT (Narrower Term), and RT (Related Term). • Through the use of the TRT's standardized vocabulary, the transportation community's access to information has been improved. • http://ntlsearch.bts.gov/tris/trt.do
Top Terms: Transportation Transportation operations Management and organization Communication and control Planning and design Construction and maintenance Testing Safety and security Environment Economic and social factors Top Terms (continued): Persons and personal characteristics Organizations Facilities Vehicles and equipment Materials Physical phenomena Disciplines Mathematics Areas and regions Time Information organization Introduction
Introduction • Also see the Bureau of Transportation Statistics for the data: • The Transportation Statistics Annual Report (TSAR), a Congressionally mandated publication, provides a data overview of U.S. transportation issues. Each TSAR has two essential components: a review of the state of transportation statistics with recommendations for improvements and a presentation of the data. • TSAR, first published in 1994, has been redesigned every few years. The most recent format was introduced with the TSAR published in October 2003. In this TSAR all transportation data and analysis are captured in one Indicators chapter with 15 topics. Most of these topics were specified in the Intermodal Surface Transportation Act of 1991, which originally authorized the Bureau of Transportation Statistics (BTS). • http://www.bts.gov/publications/transportation_statistics_annual_report/2005/
Introduction • Transportation Statistics Annual Report (TSAR) topics: • Section 1: Traffic Flows • Section 2: Condition of the Transportation System • Section 3: Accidents • Section 4: Variables Influencing Traveling Behavior • Section 5: Travel Times • Section 6: Availability of Mass Transit and Number of Passengers Served • Section 7: Travel Costs of Intracity Commuting and Intercity Trips • Section 8: Productivity in the Transportation Sector • Section 9: Transportation and Economic Growth • Section 10: Government Transportation Finance • Section 11: Transportation-Related Variables that Influence Global Competitiveness • Section 12: Frequency of Vehicle and Transportation Facility Repairs • Section 13: Vehicle Weights • Section 14: Transportation Energy • Section 15: Collateral Damage to the Human and Natural Environment
Introduction • So in summary, the classifications are: • 35 Conference Subject Areas: TRB Home Page • 9 General Subject Categories: TRB Publications • 21 Top Terms: Transportation Research Thesaurus • 15 Topics: Transportation Statistics Annual Report (TSAR) • 183 Glossary Terms: Transportation Statistics Annual Report (TSAR) • 159 (260) Tables: Transportation Statistics Annual Report (TSAR) • ???? Data Elements: Transportation Statistics Annual Report (TSAR)
My Purpose • So the Introduction experience shows I had to do more work to get a broader context for the detailed data and information. • Build on the previous work (1) and focus on the best practices in managing data stewardship (2): • (1) 2006 TRB Annual Meeting Workshop on Data Stewardship: Meeting the Challenges of Enterprise Data Management. • (2) Implemented in the transportation domain: • Build an ontology of the Workshop Abstracts; and • Make a major transportation document a DRM 2.0 –compliant node in a trusted reference knowledge network. • Federal Enterprise Architecture Data Reference Model 2.0: • See http://www.whitehouse.gov/omb/egov/a-5-drm.html • Owen Ambur: XML CoP Co-Chair Emeritus: In my view, that is highly ironic, since the purpose of the DRM is to facilitate the sharing of data but agency DRMs themselves may be in nonstandard, nonshareable, "abstract" formats.
The FEA Data Reference Model 2.0 DRM 1.0 SICoP All Three Ontologies Source: Expanding E-Government, Improved Service Delivery for the American People Using Information Technology, December 2005, pp. 2-3. http://www.whitehouse.gov/omb/budintegration/expanding_egov_2005.pdf
DRM 2.0 Implementation Metamodel Note: This TRB/BTS pilot makes these links visible and searchable! • Definitions: • Metamodel: Precise definitions of constructs and rules needed for abstraction, generalization, and semantic models. • Model: Relationships between the data and its metadata - W3C. • Metadata: Data about the data for: Discovery, Integration, and Execution. • Data: Structured e.g. Table, Semi-Structured e.g. Email, and Unstructured e.g. Paragraph. Source: Professor Andreas Tolk, 2005.
SICoP Knowledge Reference Model Ontology TRT The point of this graph is that Increasing Metadata (from glossaries to ontologies) is highly correlated with Increasing Search Capability (from discovery to reasoning).
Concept Map of DRM 2.0 Is_a Recall Slide 14 and see next slide for explanation.
Concept Map of DRM 2.0 • Essentially a Data Model of a Data Model! • PDF Version for Use in a Document, SVG Version for Use on the Web, XML Version for Structure, OWL Version for Semantic Relationships, and Simple Text Version. • Source: Brand Niemann, Jr., Informal Communication, October 28, 2006, as part of the October 11, 2006, Birds of a Feather Meeting on National Information Sharing Standards at the Fifth Semantic Interoperability for E-Government Conference, October 10-11, 2006. • http://colab.cim3.net/file/work/SICoP/2006-10-10/NatilStandards_10_11_2006.doc • See Concept Maps Home Page at http://cmap.ihmc.us/
Brief Tutorial • How to Build An Ontology: • Based on Professor Barry Smith’s Tutorial (Video): • http://ontology.buffalo.edu/smith/articles/ontologies.htm • How to Implement DRM 2.0: • My work to lead the OMB/CIOC DRM 2.0 Implementation Through Iteration and Testing: • http://colab.cim3.net/cgi-bin/wiki.pl?DRMImplementationThroughIterationandTestingPilotProjects • http://web-services.gov/DRMITIT10172005.doc
How to Build An Ontology • Ontologies (technical): • Standardized classification systems which enable data from different sources to be combined. • To help us navigate through oceans of data. • Intelligible to human beings, computationally useful, and capable of being glued together. • Ontology (science): • The empirical study of how to build humanly useful and computationally tractable representations of entities and of the relations between them. • National Center for Ontological Research (NCOR). • Ontology (philosophy) • The theory of being.
How to Build An Ontology http://en.wikipedia.org/wiki/Ontology_%28computer_science%29
How to Build An Ontology In this ontology diagram, the Ford Bronco object might have the following attribute: Successor: Ford Explorer that tells us that the Explorer is the model that replaced the Bronco. In this partial diagram of an ontology, Vehicle subsumes Car which has a partition into the two classes: 2-Wheel Drive and 4-Wheel Drive.
How to Build An Ontology • Rules and Standards: • Ontologies must be intelligible both to humans (for annotation and curation) and to machine (for reasoning and error checking) – the lack of rules for classification leads to human error and blocks automatic reasoning and error checking. • Intuitive rules facilitate training of curators and annotators. • Common rules allow alignment with other ontologies. • Otherwise: Large databases that are unusable without a controlled vocabulary and annotations.
How to Build An Ontology • A window on reality: A good cartoon – legend, labels, etc. • Words are the backbone for ontology development: • For computers and humans in other domains. • Catalog versus Inventory: • The former is the ontology and the latter is the database. • A representation of “universals”: • In science, the periodic table is the best ontology.
How to Build An Ontology Instances (database) (e.g. Parts Inventory) Ontology (Knowledge Representation) Universals (types) (e.g. Catalog)
How to Build An Ontology • Problem of ensuring sensible cooperation in a massively interdisciplinary community: • Concept • Type • Instance • Model • Representation • Data • Agree upon defintions.
How to Build An Ontology • Work with Subject Matter Experts to create an initial top level classification of your domain = ~50 most common types of entities corresponding to universals in reality. • Arrange these terms into a formal is_a hierarchy according to this Universality principle: • A is_a B = every instance of A is an instance of B • Fill in missing terms to give a complete hierarchy and annotate your data: • Leave to Subject Matter Experts to populate the lower levels of the hierarchy. • Avoid abbreviations and acronyms. • Terms should always have the same meaning on every occasion of use. • They should refer to the same universals.
How to Build An Ontology • Supply definitions wherever possible. • Each term should have at most one definition. • Avoid circular definitions: • The term defined should not appear in its own definition. • A definition should use terms which are easier to understand than the term defined. • Use Aristotelian definitions: • An A is a B which C’s. A is lower than B and C is the specification that makes Bs into As. • Do not seek to define everything. • Some terms and some relations are primitive – they cannot be defined (e.g. identity)
How to Build An Ontology • Ontologies should include only those relational assertions which hold universally: • Not true: Pneumococcal virus causes pneumonia (many just lie around and don’t cause pneumonia) • Order is often important: • We can assert: adult transformation_of child • But not child transforms_into adult
How to Build An Ontology • Top Level Class Hierarchy: • Root • Then Two kinds of entities: • Occurrents (processes, events, happenings,..) • Aka Endurants (have continuous existence in time, preserve their identity through change, exist in toto whenever they exist at all) • Your life is a occurrent. (4-dimensional) • Continuants (objects, qualities, states,…) • Aka Processes (have temporal parts, unfold themselves in successive phases, exist only in their phases) • You are a continuant. (3-dimensional)
How to Build An Ontology • Top-Level Ontology Example - Basic Formal Ontology (BFO) : • Two Basic Categories: • Continuant • Independent Continuant • Example: Cell component • Dependent Continuant (function) • Example: Molecular function • Occurrent (always dependent on one or more independent continuants) (functioning) • Example: Biological process • Then Instances….. See http://www.ifomis.uni-saarland.de/bfo/home.php
c entity c continuant c dependent_continuant c independent_continuant c occurrent c processual_context c processual_entity c spatialtemporal_region c temporal_region How to Build An Ontology BFO Encoding Building Ontologies with TopBraid Composer http://www.topquadrant.com
How to Build An Ontology • The OBO Foundry is a collaborative experiment, involving a group of ontology developers who have agreed in advance to the adoption of a growing set of principles specifying best practices in ontology development. These principles are designed to foster interoperability of ontologies within the broader OBO framework, and also to ensure a gradual improvement of quality and formal rigor in ontologies, in ways designed to meet the increasing needs of data and information integration in the biomedical domain. The goal is to develop a set interoperable gold standard reference ontologies for all major domains of biomedical research. • http://obofoundry.org/ • OBO Relation Ontology: An ontology of core relations • http://obo.sourceforge.net/relationship/
How to Build An Ontology • 1. The ontology is open and available to be used by all without any constraint other than (1) its origin must be acknowledged and (2) it is not to be altered and subsequently redistributed under the original name or with the same identifiers. • 2. The ontology is in, or can be expressed in, a common formal language. A provisional list of languages supported by OBO is provided at http://obo.sf.net/ • 3. The ontology possesses a unique identifier space within OBO. • 4. The ontology provider has procedures for identifying distinct successive versions. • 5. The ontology has a clearly specified and clearly delineated content. • 6. The ontology includes textual definitions for all terms. • 7. The ontology uses relations which are unambiguously defined following the pattern of definitions laid down in the OBO Relation Ontology. • 8. The ontology is well-documented. • 9. The ontology has a plurality of independent users. • 10. The ontologies in the OBO Foundry will be developed in a collaborative effort.
is_a part_of integral_part_of proper_part_of located_in contained_in adjacent_to transformation_of derives_from preceded_by has_participant has_agent instance_of OBO Relation Ontology http://obo.sourceforge.net/relationship/
How to Build An Ontology • Workshop Abstracts: • Unstructured to semi-structured: • Title • Number • Date • Time • Location • Presiding • Sponsor (hopefully a reason for this) • Abstract • Information • Use search results for Conference Subject Areas to build initial ontology (catalog) • Provide Semantic Links: • Not providing semantics in the links is one of the main navigational problems of the World Wide Web: It is not until one opens the destination page of a link that one finds out that its content is not of interest.
How to Build An Ontology Supports Both Free Text and Fielded searched.
How to Build An Ontology Search Just This Node for the Word “Aviation” and Show the Context (5 Words).
How to Build An Ontology Search for countries with an inflation rate greater than 10 percent.
How to Build An Ontology Semantic Links from the Catalog. Recall Slides 25 and 30.
How to Implement DRM 2.0 DRM 2.0 Compliant Node of the TSAR 2005.
How to Implement DRM 2.0 Copy and Paste Data Table to Excel Because of XML Markup.
How to Implement DRM 2.0 Taxonomy based on two Congressional Acts.
How to Implement DRM 2.0 Data Assets Associated with Taxonomy.
Coordination with Other Activities • NCOR-NIST-Ontolog Forum: • Ontology Measurement and Evaluation: • The OBO Foundry - A Gold Standard Approach to Ontology Evaluation, December 21, 2006 • Database and Ontology: • Model-Driven Architecture and Semantic Web Technologies, January 4, 2007. • Ontology Driven Applications: • Top 5 Applications, October 10, 2006. • Ontologizing the Ontolog Body of Knowledge: • Ontolog Taxo Thesaurus, April 20, 2006. • Upper Ontologies: • Upper Ontology Summit, March 15, 2006. http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage
Conclusions and Recommendations • Multiple classifications of multiple sources require moving up the interoperability scale (recall slide 16) and repurposing content as DRM 2.0 – compliant nodes in a distributed network. • Build and use multiple ontologies: • An upper ontology (or foundation ontology) is a model of the common objects that are generally applicable across a wide range of domain ontologies. It contains a core glossary in whose terms objects in a set of domains can be described. There are several standardized upper ontologies available for use, including Dublin Core, SUMO, etc. • http://en.wikipedia.org/wiki/Ontology_%28computer_science%29