510 likes | 647 Views
A FRAMEWORK FOR GEOSPATIAL MODELING FROM SPARSE FIELD MEASUREMENTS USING IMAGE PROCESSING AND MACHINE LEARNING. 1 Peter Bajcsy, 1 Chulyun Kim, 2 Jihua Wang and 2 Yu-Feng Lin 1 National Center for Supercomputing Applications (NCSA) 2 Illinois State Water Survey (ISWS)
E N D
A FRAMEWORK FOR GEOSPATIAL MODELING FROM SPARSE FIELD MEASUREMENTS USING IMAGE PROCESSING AND MACHINE LEARNING 1Peter Bajcsy, 1Chulyun Kim, 2Jihua Wang and 2Yu-Feng Lin 1National Center for Supercomputing Applications (NCSA) 2Illinois State Water Survey (ISWS) University of Illinois at Urbana-Champaign (UIUC)
Outline • Introduction • Problems Addressed by Spatial Pattern To Learn (SP2Learn) • SP2Learn Architecture and Functionality Overview • Running SP2Learn • Summary
General Problem • Compute a set of geo-spatially dense accurate predictions of variables • given a set of direct geo-spatially sparse point measurements and • auxiliary variables with implicit relationships with respect to the predicted variable • Motivation: • minimize cost of taking direct point measurements • maximize accuracy of predictions and • automate discovering relationships among direct field measurements and indirect variables
Formulation • Input: sets of geo-spatially sparse variables {Vi{pij}} & dense auxiliary variables & a priori tacit knowledge of experts • Output: geo-spatially dense (raster) {Ok} • Unknown: selection of methods & workflow of operations/methods & parameters of methods & relationships of auxiliary variables w.r.t Ok & quantitative metric of output goodness p2j Interpolations Mathematical models p1j V1 & V2 O1 Auxiliary Variables & Tacit Knowledge
Applied Problem Recharge and Discharge Rate Prediction Bedrock elevation Discharged Recharged Water table elevation
Interdisciplinary Objectives • Ground Water (Hydrologic Science) View: • Evaluation of Alternative Conceptual (implicit relationships) and Mathematical Models (explicit relationships) • Accurate Prediction of Groundwater Recharge and Discharge Rates from Limited Number of Field Measurements • Computer Science View: • Computer-Assisted Learning to Assess Alternative Conceptual and Mathematical Models • Optimization of Prediction Models From a Set of Geo-Spatially Sparse Point Measurements DIALOG
Recharge zone Noisy pattern or weak R/D Uniform Grid: 80mX80m Discharge Recharge Min. Grid: 805mX805m Discharge zone State-of-the-Art Results • Limited Spatial Resolution and Accuracy
Existing Software for Groundwater and Surface Water Modeling • MODFLOW is a three-dimensional finite-difference ground-water model • http://water.usgs.gov/nrp/gwsoftware/modflow2005/modflow2005.html - freeware (2005) • PEST - is software for model calibration, parameter estimation and predictive uncertainty analysis • http://www.sspa.com/pest/ - freeware (2007); University of Queensland, Australia • Precipitation-Runoff Modeling System (PRMS) – is deterministic, distributed-parameter modeling system developed to evaluate the impacts of various combinations of precipitation, climate, and land use on streamflow, sediment yields, and general basin hydrology • http://water.usgs.gov/software/prms.html - freeware (1996); USGS • Deep Percolation Model (DPM) - facilitates estimation of ground-water recharge under a large range in climatic, landscape, and land-use and land-cover conditions • http://pubs.usgs.gov/sir/2006/5318/; USGS
Related Work • Singh A. et al. “Expert-Driven ‘Perceptive’ Models for Reducing User Fatigue in an Interactive Hydrologic Model Calibration Framework” Conductivity (K) and Hydraulic heads (H) for the hypothetical aquifer
Motivation • Ground Water (Hydrologic) Science: • Currently, there is no single method that could estimate R/D rates and patterns for all practical applications. • Therefore, cross analyzing results from various estimation methods and related field information is likely to be superior than using only a single estimation method. • Computer Science : • It is currently impossible • (a) to replace an expert with a lot of tacit domain knowledge by computer algorithms or • (b) to learn by an expert new I/O relationships from a plethora of possible variables and an extremely large space of processing methods and their parameters • Thus, assisting experts to discover, evaluate and validate new relationships in an iterative way will likely enable • (a) better understanding of the underlying phenomena, and • (b) more automated and cost-efficient predictions
Our Approach • Data-Driven Analyses to Test Alternative Models, and to Search the Space of Processing Operations and Their Parameters • Interpolation methods • Mathematical models • Image processing algorithms • Machine learning algorithms • Scalability of algorithms with large size data • Computer-Assisted Comparisons and Evaluations of Multiple Models and Sub-Optimal Solutions • Model/Solution Representation • Closed Loop (Iterative) Workflows • Human Computer Interfaces • Overall Approach: An Exploration Framework for a Class of Alternative Models/Hypotheses and Optimal Solutions
SP2Learn Problem Formulation • Given a set of geo-spatially sparse field measurements and auxiliary variables, derive accurate, spatially dense, R/D rate map by • (a) using physics-based model • (b) incorporating boundary conditions and • (c) exploring auxiliary variables representing prior knowledge about R/D patterns but missing in the physics-based model
Challenges • (1) How to Recognize ‘Meaningful’ Pattern of Predicted Map? • (2) How to Quantify the Goodness of the Pattern? Approach: • (1a) Recognize patterns by utilizing multiple image enhancement and segmentation techniques applied to R/D rate predictions • (1b) Introduce relationship between R/D pattern and auxiliary (a priori reference) information • (2a) Define goodness w.r.t. reference information using expert’s selection of ‘meaningful’ relationships • (2b) Define goodness w.r.t. reference information using complexity of machine learning
Using Physics-Based Model R/D Rate Prediction Field Measurements + + + + + + + + + + + + + + Discharged Recharged + Water table elevation + Hydraulic conductivity + Incoming water Outgoing water Bed rock elevation + Ground water flux=hydraulic conductivity * cell area * gradient of water table elevation (head) over cell distance
Incorporating Spatial Boundary Conditions • BC: R/D rate prediction could have smooth transitions and recharge & discharge regions (contiguous pixels) should be clearly delineated • Approach: Apply Image Restoration and De-noising Techniques • Moving average based low pass filter • TVL (Total Variation regularized L1-norm function) based filter • Morphological operation based filter • Using multiple techniques multiple times Discharged Recharged
Exploring Auxiliary Variables Driving R/D Patterns Prior Tacit Knowledge about R/D and Auxiliary Variables • Soil Type: P(R or D area/Soil=Clay)~low • Proximity to River: P(R or D area/River is close)~high • Slope: P(R or D area/ slope=high)~low moving average normalization+TVL normalization+TVL moving average
From Auxiliary Variables To Knowledge and Accurate R/D Load Variables Integrate Maps Load R/D Map Create Decision Tree Define ROI Apply Rules
SP2Learn Output • A set of rules that define relationships between predicted (R/D rate) variable and auxiliary variables • Modified (more accurate) predictions according to the user selected rules defining relationships of predicted and auxiliary variables • Sensitivity analysis results with respect to • Methods (interpolations, image enhancement, …) • Models • Parameters
Example Results ROI • <RULE ID=138 NUM_OF_CASES=3975 SUPPORT=32.65%> • <IF>Elevation is not in {330-344} AND • Soil type is in {Rm=Roscommon muck} AND • Proximity to water body is not {near_water} AND • Slope is in {0-0.9} </IF> • <THEN>R/D rate is -0.004,-0.002</THEN> = +
SP2Learn Functionality Overview Load Raster Step Integration Step Create Mask Step Rules Step Attribute Selection Step Apply Rule Step
Software and Test Data Download • Download web page of Image Spatial Data Analysis group at NCSA: http://isda.ncsa.uiuc.edu/download/
Input Data to SP2Learn • Raster files (maps) • Predicted R/D rate models • Auxiliary variables • For mask creation • Tables with geo-points • Vector files with boundaries • Raster files of categorical or continuous variables
Image Processing • Filtering Methods • Low pass (moving average) filters • Morphological filters • TVL1 (Total Variation regularized L1 function) • Using multiple techniques multiple times • Parameters • Kernel size (row dimension, column dimension)
Example Input Maps Low Pass Filter Morphological Closing Morphological Opening Kernel = (10,10) Kernel = (10,10) Kernel = (10,10) Kernel = (5,5) Kernel = (5,5) Kernel = (5,5)
Example Auxiliary Maps • Slope • DEM • Soil • River Stream
Loading Files • Load R/D rate models (maps) • Load auxiliary maps to explore alternative models • Proximity to water • Soil type • Slope • …
Mosaic Maps • Large spatial coverage – a set of tiles • Out-of-core representation
Viewing Images • Right mouse click • Image information • Zoom • Check boxes • Pseudo-color • Auto-fit images
Registration • Integration of all maps (raster images) to a common projection and spatial resolution Before “Convert” After “Convert”
Create Mask C A Mask Parameters Visualization Panel B Mask Operations
User Defined Mask Creation • Set Parameter: User defined • Mouse click-and-drag selection of region • Click Paint and Show • Click Apply
Label Editor • Assign categorical labels to colors
Attribute Selection • Output: Predicted Variable • Input: Auxiliary Variables • Check-boxes • Show Table • Prune Tree
Soil Type is {sand}? no yes Distance from river ≤ 100 ft? yes Discharge no Case A.. Recharge Discharge Case E.. Case J.. Decision Tree Based Modeling • Tree structure can be represented as a set of rules
Rules from Decision Tree • Num: Node number in a decision tree. • Support(%): Among all cases satisfying conditions, the ratio of cases having the same class (conclusion). • # of cases: The number of cases satisfying conditions • Class: Conclusion of a rule • Conditions: Conditions of a rule • MDL Score: MDL score of a decision tree. The less the score is, the better the tree is
Show Decision Tree Show Tree Option
Export Rules • XML format Export Rules Option
Apply Rules • Visualization of • Modified output variable • Changed pixels • Magnitude of changes (differences)
Summary • Novel Frameworks and Methodologies for Exploratory Data-Driven Modeling and Scientific Discoveries • Problems addressed in the prototype SP2Learn solution: • Prediction accuracy improvement by a combination of mathematical models and data-driven (knowledge based) models, supervised and unsupervised iterative model optimization • Better Data Utilization!
Extra Information • A stack of informatics and cyber-infrastructure software is open source • Other software of potential interest: • GeoLearn is an exploratory framework for extracting information and knowledge from remote sensing imagery • CyberIntegrator to support creation of exploratory workflows, reuse of workflows, remote server execution, data and process provenance tracking and analysis, streaming data support • Image Provenance to Learn (IP2Learn) to support decision processes based on visual inspection of images • Load Estimation (work in progress) to support optimal sampling of sediment loads using several sediment-discharge rating curves, bias correction factors and Monte Carlo simulations to predict confidence limits • Download web page of Image Spatial Data Analysis group at NCSA: http://isda.ncsa.uiuc.edu/download/
Acknowledgement • Funding Agencies: • NASA, NARA, NSF, NIH, NAVY, DARPA, ONR, NCSA Industrial Partners, NCSA Internal, COM UIUC, State of Illinois • Full Time Employees: • Peter Bajcsy, Rob Kooper, Sang-Chul Lee, Luigi Marini • Students: • Shadi Ashnai, Melvin Casares, Miles Johnson, Chulyun Kim, Qi Li, Tim Nee, Arlex Torres, Ryo Kondo, Henrik Lomotan, James Rapp • Collaborators: • College of Applied Health Sciences UIUC, Kinesiology Dept. UIUC, CEE UIUC, CS UIUC, GISLIS UIUC • UIC, UC Berkeley, Univ. of Texas at Austin, Univ. of Iowa • ISWS, NARA, Nielsen, State Farm • Instituto Tecnológico de Costa Rica, UNESCO-IHE Netherlands
Thank you! • Questions: • Peter Bajcsy pbajcsy@ncsa.uiuc.edu • Need More Details • Publications: http://isda.ncsa.uiuc.edu