140 likes | 298 Views
Data for Environmental Modeling (D4EM) Automated Data Retrieval and Processing in Support of Integrated Environmental Modeling. Motivation. Most environmental models are not aware of data sources and data preprocessing requirements Model user is responsible for : Determining data requirements
E N D
Data for Environmental Modeling (D4EM)Automated Data Retrieval and Processing in Support of Integrated Environmental Modeling
Motivation • Most environmental models are not aware of data sources and data preprocessing requirements • Model user is responsible for : • Determining data requirements • Locating data sources • Retrieving data • Processing data (including geo-processing) per model requirements • Logging metadata (data gathering and processing operations)
Motivation • The problem amplifies when feeding an integrated modeling system as compared to a single model. • High cost of retrieving and processing data.
3MRA contains 17 linked science models Project included data collection for 201 sites at a cost of > $1 million Adding a new site to database costs remains a challenge
Data access and pre-processing issues for Integrated Environmental Modeling System • Heterogeneous data sources • USGS, EPA, NOAA, … • Heterogeneous data types • jpeg, text, GeoTIFF, xml, netCDF, raster, vector, … • Multiple geographic projections • UTM, Geographic, … • Multiple access protocols • SOAP, WMS, OPeNDAP, ftp, HTTP screen scraping, …
Data access and pre-processing issues for Integrated Environmental Modeling System • Inconsistency in data retrieval and pre-processing operations across models may introduce uncertainty in an integrated modeling application. • Model comparisons difficult without consistent data sources and pre-processing algorithms.
SOLUTION • Automate data retrieval from national and local data sources • Automate preparing model input files from retrieved data • Automate metadata compilation and logging
Existing Example • BASINS: • Downloads and pre-processes input data for models • Open source • Not an integrated modeling system • Not component oriented architecture
Requirements • Develop a software library that can be incorporated into models and modeling frameworks • Open source • Include open source geo-processing • Include statistical processing • Extensible architecture to allow easy addition of new data sources • Transparent (logging of all operations) to allow quality assurance
Data Sources NLCD NHD Plus BASINS STORET NWIS <…> Custom Application Interfaces Data Access Layer Support Libraries MRLC Data Adapter NHDPlus Data Adapter BASINS Data Adapter STORET Data Adapter NWIS Data Adapter <….>Data Adapter Graphing Compression Downloading Local Cache BASINS APES/SDM Workflow Batch Data Processing Layer Geoprocessing Reprojection Clipping Overlays Merging Stream network & watershed processing Define minimum stream length Combine stream segments Delineation < …. > Statistical processing Averaging, Interpolation Data Store Raster files Databases Metadata Vector files ASCII/XML files Model Interface Layer SWAT Data Adapter HSPF Data Adapter WASP Data Adapter 3MRA Data Adapter <…> Data Adapter SWAT HSPF WASP 3MRA <….> Model
DotSpatial • Light weight GIS • Open source • Extensible/Plug-in architecture • Geo-processing separated from visualization • Can be incorporated as a component into a model or a model can be housed inside DotSpatialapplication
Data sources span the range from remotely hosted national databases to state or regional databases to local application specific databases. • Web maps • Open Street Maps • Google • Bing • NWIS • STORET • NLCD (1992, 2001, 2006) • BASINS • Land use/land cover • Urbanized areas • Populated place locations • Reach File version 1 (RF1) • Elevation (DEM) • National Elevation Dataset (NED) • Major roads • USGS HUC boundaries • Accounting unit • Cataloging unit • Dam sites • EPA regional boundaries • State boundaries • County boundaries • Federal and Indian lands • Ecoregions • STATSGO • NHD Plus • National Data Buoy center • Planned • Soils (SSURGO)
Future Work (Potential) • Add more data sources • Add more models • Data Mining / Data Discovery / Business Intelligence • Multi-dimensional data analysis • Identify trends/patterns in input and output data • Characterize uncertainty in input data
D4EM Software Download d4em.webdev.ord.epa.gov (intranet, coming soon) http://code.google.com/p/d4em/