460 likes | 607 Views
Integr ácia a spracovanie údajov o životnom prostredí Technol ógia ADMIRE. Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060. Goals. Accelerate access to and increase the benefits from data exploitation;
E N D
Integrácia a spracovanie údajov o životnom prostredíTechnológia ADMIRE Ondrej Habala Seminár CRISIS, 18.10.2011 ITMS 26240220060
Goals Accelerate access to and increase the benefits from data exploitation; Deliver consistent and easy to use technology for extracting information and knowledge; Cope with complexity, distribution, change and heterogeneity of services, data, and processes, through abstract view of data mining and integration; and Provide power to users and developers of data mining and integration processes.
ADMIRE Gateways USMT
DISPEL – Data Intensive Systems Process-Engineering Language • Data-intensive distributed systems • Connection point of complex application requests and complex enactment systems • Benefit: method development, engineering and evolution of supported practices can take place independently in each world • Describes enactment requests for streaming-data workflows processes • “Process-engineering time” – transform and optimize process in preparation for enactment period
DISPEL: Simple Example Creating streams of literals String sql1 = "SELECT * FROM some_table"; String sql2 = “SELECT * FROM table2”; String resource = "128.18.128.255"; SQLQuery query = new SQLQuery; |- sql1, sql2 -| => query.expression; |- resource -| => query.resource; Tee tee = new Tee; query.result => tee.connectInput; Creating connections
Aplikačné štúrieNasadenie technológie admire v životnom prostredí
Flood ApplicationData sets used in hydrological scenarios FSKD 2010 Yantai, China, August 10-12 11
Orava scenario • Legend • Green area – Orava (part of north Slovakia) • Blue – Orava reservoir and local rivers • Red dots– hydrological measurement stations • Notes • We are interested only on hydrological stations below the Orava reservoir • In our tests we will use the hydrological station 5830 (Tvrdosin)
ORAVA – data mining concept • Targets – water level and temperature at a station below the reservoir Targets of data mining Given in a schedule Predicted by a meteo model Predictors – rainfall amount (reservoir and station), air temperature (reservoir and station), reservoir discharge, reservoir temperature
ORAVA – data integration • Integration of data from • GRIB files • Reservoirs • Inputs • Time period of experiment • Reservoir ID • List of hydro stations • Geo coordinates
ORAVA ScenarioIntegrated and preprocessed data Integrated raw data Time [hours] Integrated preprocessed data Time [hours]
Implementation Notes • Needed to write custom activities for certain data extraction tasks • Data integration was the most complex part of the scenario in terms of workflow design • Data integration was quite easy to write and modify in DISPEL once we had all the PEs in place • Used composite PE to extract different types of quantities from meteorological GRIB files
Radar Scenario Very short-term rainfall prediction from weather radar data
Radar ScenarioDescription • Very short-term rainfall prediction from weather radar data • Movement of areas with higher air moisture content, and thus also higher precipitation potential • Networkofsynopticstations in Slovakia • 27 stations in Slovakia • Useddatafromyears 2007 and 2008 • Available variables: rainfall, humidity, Radar reflexivity, atmosphericpressure and temperaturevaluesforeachhour
Overview of the main predictors and target variables in the Radar scenario. The green cells are predicted from meteo-model. Blue cells are from model, based on motions vectors. Yellow cells are final target of data mining. Radar ScenarioMain predictors and target variables
Radar ScenarioAtributes of model • Isotonic regression model • 10-fold Cross Validation • Hydro-meteorological performance
RADAR model • Other tested models • Neural networks, SMOreg, linear regression, ... • Reached correlation coeficient between 0,35 and 0,42 • Validation - 10 Cross Fold • Problems in model creation : • process is significantly stochastic • Some input variables/parameters (humidity) are backwards dependent on output – rainfall. • Meteorological process is very sensitive • Reflection matrix represents quantity of water in atmosphere, not exact rainfall rate in specified area, as opposed to data from synoptic stations
Radar Scenario Training Forecast
SVP Scenario Forecast of reservoir inflow based on temperature, precipitation and snow cover
SVP ScenarioStructureofdata • Two steps of prediction : • Copy previous values of snow quantity and inflow volume. • Apply trained models (snow model at first, and then inflow model). P(t) = S(t-1) I(t) = F(t-1) S(t) = f(P(t), R(t), E(t)) F(t) = h(I(t), S(t), E(t), R(t))
SVP ScenarioModels & Attributes • 10-Fold Cross Validation, 8760 records; models for inflow prediction • N-Fold Cross Validation, 8760 records; Decision Tree Model M5P
ADMIRE Tools Registry client GUI Process designer SKSA Gateway Process Manager DMI Model Visualizer
Registry client GUI Read-only access to ADMIRE Registry list PEs and view their properties search, sort PEs Write access to Registry is done via DISPEL documents
Process Designer Manage your DMI project (files, directories – project structure) Select elements from the Registry View the canonical (DISPEL) representation of your DMI process in real time View the properties of your chosen elements Edit your DMI process graphically
Semantic Knowledge Sharing Assistant Context the user works in Several reservoirs, one settlement Knowledge that may be useful in this context previously entered by other users Provides access to existing user’s knowledge, sorting and selecting it automatically according to the user’s current working context
Gateway Process Manager Keep track of running processes stop/pause/cancel the process view the process’ source DISPEL access process’ results (if available) in several ways – raw or visualized
DMI Model VisualizerFor data mining experts Visualization of data mining models Read Weka classifier object produce PMML description of the model Show the PMML as a graphical tree