160 likes | 258 Views
Distributed Data Analysis & Dissemination System (D-DADS ). Special Interest Group on Data Integration June 2000. Overview. Environmental data are collected by multiple, disparate data providers, such as individual EMPACT projects
E N D
Distributed Data Analysis & Dissemination System (D-DADS) Special Interest Group on Data Integration June 2000
Overview Environmental data are collected by multiple, disparate data providers, such as individual EMPACT projects Each data provider presents their data in their own format making it difficult to find, access, read, and integrate the data Standardized formats and data dissemination systems are required for data accessibility and integration of distributed data sets This proposal presents a distributed data analysis and delivery system that provides users with data access to multiple sources
The Data Flow Process:From Raw Data to Refined Knowledge • Primary data are gathered from providers of sensory data • Data are integrated, filtered, aggregated and fused into secondary data • Reports are prepared for delivering environmental knowledge to the public EMPACT
Data Flow Resistances The data flow process is hampered by a number of resistances. • The user does not know what data are available • The available data are poorly described (metadata) • There is a lack of QA/QC information • The data come in various formats requiring hand crafted codes to read and manipulate them These resistances can be overcome through a distributed system that catalogs and standardizes the data allowing easy access for data manipulation and analysis.
Interoperability One requirement for an effective distributed environmental data system is interoperability, defined as, • “the ability to freely exchange all kinds of spatial information about the Earth and about objects and phenomena on, above, and below the Earth’s surface; and to cooperatively, over networks, run software capable of manipulating such information.” (Buehler & McKee, 1996) • Such a system has two key elements: • Exchange of meaningful information • Cooperative and distributed data management
Distributed Data Analysis & Dissemination System:D-DADS • Specifications: • Uses standardized forms of data, metadata and access protocols • Supports distributed data archives, each run by its own provider • Provides tools for data exploration, analysis and presentation • Features: • Data are organized as multidimensional data cubes • Dimensional data cubes are distributed but shared • Analysis is supported by built-in and user functions • Supports other data types, such as images, GIS data layers, etc.
The D-DADS Components • Data Providerssupply primary data to system, through SQL or other data servers. • Standardized Description & Format populate and describe the data cubes and other data types using a standard metadata describing data • Data Access and Manipulation tools for providing a unified interface to the data cubes and GIS data layers for accessing and processing (filtering, aggregating, fusing) data and integrating data into virtual data cubes • Usersare the analysts who access the D-DADS and produce knowledge from the data The multidimensional data access and manipulation component of D-DADS can be implemented using OLAP.
On-line Analytical Processing: OLAP • A multidimensional data model making it easy to select, navigate, integrate and explore the data. • An analytical query language providing power to filter, aggregate and merge data as well as explore complex data relationships. • Ability to create calculated variables from expressions based on other variables in the database. • Pre-calculation of frequently queried aggregated values, i.e. monthly averages, enables fast response time to ad hoc queries.
Fast Analysis of Shared Multidimensional Information (FASMI)(Nigel, P. “The OLAP Report”) An OLAP system is characterized as: being Fast – The system is designed to deliver relevant data to users quickly and efficiently; suitable for ‘real-time’ analysis facilitating Analysis – The capability to have users extract not only “raw” data but data that they “calculate” on the fly. being Shared – The data and its access are distributed. being Multidimensional – The key feature. The system provides a multidimensional view of the data. exchanging Information – The ability to disseminate large quantities of various forms of data and information.
Multi-Dimensional Data Cubes • Multi-dimensional data models use inherent relationships in data to populate multidimensional matrices called data cubes. • A cube's data can be queried using any combination of dimensions • Hierarchical data structures are created by aggregating the data along successively larger ranges of a given dimension, e.g time dimension can contain the aggregates year, season, month and day.
Distributed Database Data View(Table, Map, Time Chart, etc.) User Interaction with D-DADS Query XML data XML data
Example Application: Visibility D-DADS Visibility observations (extinction coefficient) are an indicator of air quality and serve as an important data set in the public’s understanding of air quality. A visibility D-DADS will consist of multiple forms of visibility data, such as visual range observations and digital images from web cameras. Potential visibility data providers include: - EMPACT projects and their hourly visual range data - The IMPROVE database - CAPITA, a warehouse for global surface observation data available every six hours
Possible Node in Geography Network National Geographic and ESRI are establishing a geography network consisting of distributed spatial databases. Some EMPACT projects are participating as nodes in the initial start-up phase The visibility distributed data and analysis system could link to and become another node in the geography network, making use of the geography network’s spatial viewers. Other views, such as a time view could be linked with the spatial viewer to take advantage of the multidimensional visibility data cubes.
Example Viewer Map View Variable View Time View WebCamView The views are linked so that making a change in one view, such as selecting a different location in the map view, updates the other views.
Summary • In the past, data analysis has been hampered by data flow resistances. Fortunately, the tools and framework to overcome these resistances now exist, including: • World Wide Web • XML • OLAP • ArcIMS • Metadata standards • It appears timely to consider a distributed environmental data analysis and dissemination system.