340 likes | 487 Views
Preliminary Description of the Environmental Data Challenge for DoD M&S. Briefing by: Virginia T. Dobey SAIC/SETA Support to DMSO Environmental Representation Domain Lead (703) 824-3411 or (703) 963-8512 Virginia.Dobey.CTR@dmso.mil. Level I. Level II. Level III. Level IV. Level V.
E N D
Preliminary Descriptionof theEnvironmental Data Challengefor DoD M&S Briefing by: Virginia T. Dobey SAIC/SETA Support to DMSO Environmental Representation Domain Lead (703) 824-3411 or (703) 963-8512 Virginia.Dobey.CTR@dmso.mil
Level I Level II Level III Level IV Level V DMSO Task:Environmental Representations
Space conditions Atmospheric conditions Terrain conditions Ocean conditions Impact of platforms, weapons, sensors, and their actions on space, atmosphere, terrain, and ocean conditions Effects of space, atmosphere, terrain, and ocean conditions on platforms, weapons, and sensors In M&S, a complete and accurate environmental representation must include not only the environmental conditions but also their effects on system C&P, as well as feedback of system activity on the environment. This, in turn, requires environmental data that can be FUSED with other data sources.
The Emerging GIG Data Environment (Task, Post, Process and Use - TPPU) Producer tags and post data Consumer Producer Searches metadata catalogs to find data Analyzes metadata to determine context of data found Pulls selected data based on understanding of metadata Describes content using metadata Posts metadatain catalogs and data in shared space Security Services (e.g., PKI, SAML) Metadata Catalogs Ubiquitous Global Network Consumer can find and pull data based on metadata tags Shared DataSpace Enterprise & Community Web Sites Metadata Registries Application Services (e.g., Web) Actual Data posted to shared data spaces Location of Data Posted in Metadata Catalogs Developer Posts to and uses metadata registries to structure data and document formats for reuse and interoperability Data Standards posted in Metadata registries
GIG Policy: The TPPU Paradigm (diagram obtained from: http://ges.dod.mil/about/tppu.htm)
What are Warfighter Issues?Shifting Paradigms • The adoption of a Net-Centric Data Enterprise • It’s not just a producer / user world anymore… (now EVERYONE’s a producer!) • Consumers want access to data / information / knowledge immediately • Consumers want to input how the data is manipulated/filtered • Moving from a …Collector / Product focus: Task, Process, Exploit and Disseminate • To a ... Analyst / Data focus: Task, Post, Process and Use (share) • Reliance on “Factory” • Resource intensive data downloadOne (producer) to many (consumers) • Bandwidth utilization / availability - not a consideration • Moving to “many-to-many” topology • Smart “data ordering” agents • Sharing of information • Immediate access to Through-the-Sensor data • Bandwidth - critical to warfighters
GIG: Increasing the Interoperability Challenge • Everyone is a potential producer • Multiple legacy environmental data sources and user systems exist • Significant investment in existing production and user hardware and software • Data in multiple (often system-specific) formats need updating • Few data resources are reliably compatible, even those produced by the Government • example: OAML — product-specific formats • “Power to the Edge” concept empowers user to identify other sources of required data • No requirement for common data syntax/semantics • Increases the challenge of data fusion
GIG: Assumptions in Assessing Environmental Data Interoperability • Traditional data producers will continue to provide data in producer-specific and product-specific formats following existing production guidelines, since those products and formats meet the general needs of most customers (users). Formats will continue to leverage producer standards such as the Joint METOC Conceptual Data Model and the Feature and Attribute Coding Catalog. Tailoring data to user requirements will remain a user responsibility. • Users will need a data mediation capability that can access not only these traditional data sources but also non-traditional and often unknown data sources such as commercial products (sometimes having proprietary formats) and streaming data from in-situ sensors (anticipated development using future technology) which can be identified and obtained over the GIG
Barriers to Data Interoperability • Data sources, models, and operational systems developed independently of each other • Simulations not traditionally designed to interface with operational systems (and sometimes with each other!) • Tailored (both in format and in content) datasets that are optimized for a specific system support only specific uses • Result: syntactically and semantically different forms of data representation are in use
Developing Interoperable Data “A data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data…An implementation of a given data model is a physical realization on a real machine of the components of the abstract machine that together constitute that model…the familiar distinction between logical and physical…” [emphasis in the original] C.J. Date1 “Logical Data Model: A model of data that represents the inherent structure of that data and is independent of the individual applications of the data and also of the software or hardware mechanisms which are employed in representing and using the data.” DoD 8320.1-M2 “Normalization leads to an exact definition of entities and data attributes, with a clear identification of homonyms (the same name used to represent different data) and synonyms (different names used to represent the same data). It promotes a clearer understanding and knowledge of what each data entity and data attribute means.” C.Finkelstein3 1Colleague of E.F. Codd, originator and developer of relational database theory 2 DoD authority on information engineering 3 “Originator and main architect of the Information Engineering methodology”
Normalization Challenges • Users are familiar with non-normalized physical data elements. Tendency is to call these “logical” and stop there. • In any large data model, normalization is difficult. It is often ignored (benign neglect). • Complete data models incorporate business rules (how the entities relate to each other). • May not be needed for an implementation-independent model used to develop a data dictionary (of interoperable concepts), but…
Achieving Data Interoperability:The Three-Schema Architecture External schema Internal schema User application views Converting user-specific data requirements into conceptual “building blocks” for data integration Conceptual schema Logical data model building blocks are the basis for application data structures Also facilitates ingest of other source data Normalized logical data model serves as conceptual design “bridge” from the external schema to and from the internal schema
Prod 1 Prod N Prod 2 … … The Three-Schema Architecture Applied to Environmental Data Fusion of normalized data internal to the system • User (production) applications: • CBRN, • Weather effects, • Terrain trafficability, … Normalized logical data model serves as conceptual “bridge • Producer product formats: • METOC producer-specific formats, • NGA product formats, • JMCDM, • FACC, … Allows for ingest of other source data Implementation-independent “middle layer” can be placed at the producer interface, user interface or somewhere in between
Creating a Reusable Implementation-Independent Middle Layer Such an architectural layer must be: • Independent of source products • Independent of optimized system implementation • Provides for the FULL SPECIFICATION of all source product data as well as all system data requirements • Developed as an implementation-independent (LOGICAL) relational data model, as required by DoDAF OV-7 Product view
A Reusable Middle Layer for Environmental Data • Requires standardized terms in all environmental domains – leverage existing International/DoD standards • Requires a concise, well-organized, non-redundant data structure – • Must extend from a normalized logical data model • Requires highly granular, independent data elements –‘atomic’ level concepts • To support the many formats required by users recise rendering of translations to and from the hub)
A Concise Non-Redundant Data Structure • Must address format as well as content • Format • Must handle the large number of required data representation formats while preserving consistency of data (the “fair fight” across the federation) • Content • Must be based on atomic data elements from a normalized logical data model (support for data fusion)
Controlled Image Base (raster) Surface Backscatter Strength as a Function of Angle of incidence and EM Band Angle of incidence in degrees 15 30 45 60 75 90 Vector topology microwave 300 290 240 207 198 170 L - Band DTED (gridded) Foundation Feature Data (vector) Trees Interchange Hub 200 90 40 9 4 0.1 160 230 180 167 158 130 EM Band Lake S - Band 165 152 78 22 8 1.5 1, 2, and 3-D point observation data X - Band 179 122 45 1 1 6 1 V - Band Geometry Nested, gridded data Challenge: The Many Formats of M&S Data Tabular data
And More Formats: Algorithmic/Model Support and Output Data
The Final Additions to the set of M&S Formats • Compact Terrain Data Base (proprietary) • DTED (product) • E&S GDF (proprietary) • E&S S1000 (proprietary) • GeoTIFF • Gridded raster • MultiGen (proprietary) • Shapefile (proprietary) • Terrex DART, Terra Vista (proprietary) • Vector Product Format (product)
“Atomic” Level Concepts To facilitate precise rendering of translations to and from the hub Producers use their own coding systems, each of which captures specific desired information—some of which may be captured by others, and some of which may be unique. Almost always each producer carries information not available from other sources. Extracting information “imbedded” in definitions through explicit statement of atomic attributes assists in adding attributes without overwriting the object
The Value of Atomic-Level Attributes: An Example Entity: Bridge over river Entity: Suspension bridge Entity: Bridge for two-way traffic Decomposed: Bridge + located over water body = river Bridge + bridge type = suspension Bridge + traffic carried = vehicular + number of traffic directions = 2 Results in: Bridge + located over water body = river + bridge type = suspension + traffic carried = vehicular + number of traffic directions = 2 (each of these attributes can be changed/updated as new information is acquired)
“Complete and Accurate”—Does That Mean Data Fusion? • Is the COP affected by METOC conditions? If so, can those effects be reflected in actual changes to the COP on the user system? This can be handled internally to the system without requiring data fusion capability. • Does the user need to derive useful or critical information from the interaction of METOC/terrain data and information in the COP and provide it to other systems? The answer to this question determines whether data fusion is required by the user. • Will the warfigher integrate environmental data into operational problems or will he use them as map or other overlays? The answer to this question determines whether data fusion is required by the user and allowed by the producer. • Does the user need to have the ability to update METOC conditions and effects as reported by data from other (e.g., intel, foreign forces, etc.) battlefield sources? The answer to this question determines whether data fusion is required by the user.
What is the total set of requirements? There are many processes and products involved (some of which, as in ArcInfo/ArcView terrain products, may be proprietary)—but the exchange mechanism must be independent of these. While we may know all of the currently available sources, will there ever be new ones available to the warfighter? Different views of the environment Air, land, sea, space Spatial location and orientation (coordinate system and datum) Lack of underlying environmental framework No integrated reference model available Representation (how the concept will be depicted on the user’s system—a visual object? 2D or 3D? A data point? Background data for algorithm use?) Naming/semantics Existing Data Models areconceptual, futuremodels which are non-integrated and don’t address current data repositories and data interchange requirements Summary: The Challenge of Data Fusion Business Technical
THE TRADITIONAL SOLUTION: Direct Mapping RESULT: A BIAS AGAINST TRANSLATION SOFTWARE
Data Data Consumer Producer 1 Application A Data Consumer Data Producer 2 Application B COMMON Data INTERCHANGE Consumer Data Producer 3 HUB Application C Data Data Consumer Producer n Application Z A GIG-Oriented Solution: The Interoperable “Middle Layer”
The Result of Improper Data Fusion What works for one system… creates unusual behaviors in another…
Project 2851 Standard Simulator Database Interchange Format (SIF) ? ASD(C3I) PDM-85 SIMNET Database Interchange Specification (SDIS) GLOBAL INFORMATION GRID Why Not Let the Producers Handle it All? ASD(C3I) PDM-85 directed DMA (NGA’s predecessor) to STOP producing system-specific formats. Without some means of creating interoperable, reusable data, billions of dollars of DoD investment in simulation and other systems would have been lost.
SEDRIS:How it works • Identify representation structure of original data object (point, vector, raster, etc.—geometry, topology, grid, pixel, etc.) (this is the data format) • Separate attribution of the object (what it is, characteristics of what it is) from its representation (this is the data content) • Determine georeferencing of the object (this is the location of each object in its original spatial reference frame—UTM, MGRS, WGS-84, any local inertial or celestial reference datum, etc.) • Overlay representation on SEDRIS Data Representation Model, convert attribution to EDCS codes, and decompose georeferencing using Spatial Reference Model • Reassemble objects from multiple sources using the SEDRIS Transmittal Format to integrate/fuse data (more than just the simple overlay that is used in C4I, M&S systems now)
So Why Keep SEDRIS? • SEDRIS is user-oriented. It opens up and reconciles data from multiple producers for multiple users. • SEDRIS is like any other standard for interoperability • it “costs” resources to implement in any single system. It is not useful for a standalone system • It saves significant resources when used in more than one system • Assessment: “It is not in industry’s best interest to use SEDRIS. It is absolutely essential that the Government keep SEDRIS alive.”
Formal Definitions of the Normal Forms (1 of 2) • 1st Normal Form (1NF) • Def: A table (relation) is in 1NF if • 1. There are no duplicated rows in the table. • 2. Each cell is single-valued (i.e., there are no repeating groups or arrays). • 3. Entries in a column (attribute, field) are of the same kind. • Note: The order of the rows is immaterial; the order of the columns is immaterial. • Note: The requirement that there be no duplicated rows in the table means that the table has a key (although the key might be made up of more than one column—even, possibly, of all the columns). • 2nd Normal Form (2NF) • Def: A table is in 2NF if it is in 1NF and if all non-key attributes are dependent on all of the key. • Note: Since a partial dependency occurs when a non-key attribute is dependent on only a part of the (composite) key, the definition of 2NF is sometimes phrased as, "A table is in 2NF if it is in 1NF and if it has no partial dependencies." • 3rd Normal Form (3NF) • Def: A table is in 3NF if it is in 2NF and if it has no transitive dependencies.
Formal Definitions of the Normal Forms (2 of 2) • Boyce-Codd Normal Form (BCNF) • Def: A table is in BCNF if it is in 3NF and if every determinant is a candidate key. • 4th Normal Form (4NF) • Def: A table is in 4NF if it is in BCNF and if it has no multi-valued dependencies. • 5th Normal Form (5NF) • Def: A table is in 5NF, also called "Projection-Join Normal Form" (PJNF), if it is in 4NF and if every join dependency in the table is a consequence of the candidate keys of the table. • Domain-Key Normal Form (DKNF) • Def: A table is in DKNF if every constraint on the table is a logical consequence of the definition of keys and domains. • Source: DATABASE-MANAGEMENT PRINCIPLES AND APPLICATIONSDr. Ronald E. Wyllys, The University of Texas at Austin, Austin, Texas, 78712-1276http://www.gslis.utexas.edu/~l384k11w/normover.html