200 likes | 353 Views
Long Term Data Base & Data Navigator II. ESPON SEMINAR 14-15 November 2006 Dipoli Conference Centre, Espoo, Finland Joël Boulier , Claude Grasland Laboratoire Géographie-cités (Paris) – HYPERCARTE Research Group Marc Guerrien, Nicolas Lambert
E N D
Long Term Data Base & Data Navigator II ESPON SEMINAR 14-15 November 2006 Dipoli Conference Centre, Espoo, Finland Joël Boulier, Claude GraslandLaboratoire Géographie-cités (Paris) – HYPERCARTE Research Group Marc Guerrien, Nicolas Lambert CNRS UMS RIATE (Paris) – HYPERCARTE Research Group Jérôme Gensel, Bogdan Moisuc, Marlène Villanova-Oliver Laboratoire LSR-IMAG (Grenoble) – HYPERCARTE Research Group
Context • ESPON 3.2 Programme (European Commission 2002-2006) • Long Term (ESPON) DataBase • Provide scenario builders quantitative inputs on selected topics at regional level for years 1980, 2000 and 2020 • Try to establish a sustainable framework for the ESPON Database in the future ESPON II, taking into account various problems encountered (missing values, changing territorial units, etc.) • HYPERCARTE Research Group • 2 Research Labs in Geography: Géographie-Cités and UMS RIATE • 2 Research Labs in Computer Science: LSR-IMAG and ID-IMAG • Goals: Advanced Methods and Tools for Spatial Analysis • Involved in ESPON 3.1, 3.4.3 and 3.2 • Software • Available: HyperAtlas (application ESPON HyperAtlas ESPON 3.1) • Soon to come: HyperAdmin, HyperSmooth, LTDB
LTDB: Objectives • Objective 1: • Provide a framework for long-term storage of thematic and geometric data for the territorial units composing a given area, at different levels • This implies tackling several issues: • Evolutivity rely on a flexible schema • Data quality keep trace of the quality of the data • Usability make it usable by other people than its designers, possibly as a shared resource • Objective 2: • Provide a framework for a reliable estimation of missing indicator values • To fill-up informational gaps • To simulate past or/and future hypothetical situations • This implies designing several components: • A set of generalized estimation methods • A set of generalized estimation strategies • A mechanism for evaluating the quality of the estimated data
ESTI…mate • Postulate: All statistical information managed by the LTDB can be described according to four dimensions E, S, T and I • (E)space: the spatial unit to which the statistical information is attached • (S)ource: the statistical institute which has produced the information • (T)ime: a period or an instant which dates the information • (I)ndicator: a thematic definition of the variable • … And then come more general problems • Instability of the administrative structures • The name and/or the borderline of E can have changed someday… • W. Germany + E. Germany Germany • Czechoslovakia Czech Republic + Slovakia • Côtes du Nord Côtes d'Armor • Isère 1960 Isère 2006 and Rhône 1960 Rhône 2006 • Heterogeneity of the sources • The source S does not provide any value for the given (E,T,I) • Missing values • At time T, no value for E and I whatever S • … How to Cope with Reality?
Application Management Module Indicator Formulae Knowledge base Geographic Ontology Legend Hierarchy of concepts Indicator Ontology Rule-based expert system Estimation Module Database Estimation Strategies Knowledge base Method Hierarchy Data Management Module Spatio-Temporal Database LTDB: General Architecture
LTDB: Architecture Components • Geographic Ontology: a gazetteer containing names of geographic entities and some relations between them • Indicator Ontology: a classification hierarchy of the themes and indicators with some relations between them (aggregation, broader term, etc.) • Indicator Formulae Knowledge Base: a set of mathematical rules for calculating new indicators using existing ones • Method Hierarchy: a classification of estimation methods • Estimation Strategy Knowledge Base: a set of rules allowing the system to choose the most appropriate estimation method in a given situation • Spatio-Temporal Database: a relational database containing the whole set of geographic entities with their known indicator values
Estimation Methods • E, S, T and I define a hypercube of information… with holes (missing values) • We need ESTImation methods • To fill up missing values of the past • To predict future values • So far, (simple) ESTImations methods have been proposed • Estimation methods based on one-dimension: E, S,T or I • Estimation methods based two (or more) dimensions: ES, SI, ET… • The Method Hierarchy and the Estimation Strategy Knowledge Base will be designed to extend this set of methods
LTDB: First and Future Developments • A first prototype has been developed • Implementation of the database schema in the open source POSTGRES DBMS • Data acquisition mechanisms in Java • The LTDB framework imports and exports data files in various formats (excel, dbf...)
LTDB: First and Future Developments • Short Term • Estimation methods hierarchy using AROM (an Object-Based Knowledge Representation System) • Indicator formulae knowledge base in AROM • Estimation strategy knowledge base with AROMTasks • Test and validation through an incremental approach: start with an example with a small set of indicators and some estimation methods, adding more indicators later • Mid term • LTDB as a research project of the HYPERCARTE Research Group, will be integrated into the HyperAtlas and HyperAdmin software • In the case of the ESPON HyperAtlas, this will allow the visualization (by simply moving a cursor on a time line, for instance) of the evolution of the ratio of two indicators through past, present, but also future time • Long term • in order to perform simulations that validate different scenarios, LTDB will integrate estimation methods relying on different parameters which convey tendencies, hypothesis and assumptions corresponding to these scenarios
Data Navigator II • General objective: produce a handbook on data acquisition and harmonization, with a focus on the themes investigated by the ESPON Program • Applied research • This project can be seen as an application and a validation test for the LTDB structure through European Databases (ESPON DB,…)
Work in Progress • Three workpackages have been determined • WP1: Use and practices of data collection in ESPON I • Short survey on practices of some TPG's (IGEAT) • Problem of national data collections (TIGRIS) • WP2: Choice of data model for ESPON II • Practical example of data integration between environmental and socio economic data (Géographie-cités & LSR-IMAG) • Choice of the best solution for data modeling and data integration (Géographie-cités & LSR-IMAG) • WP3: Handbook for data collection • Practical rules for harmonization (time and space, thematic harmonization) • Practical rules for the use of national sources • Recommendations for ESPON II : one or two databases
Integration of environmental and socio economic data • Question: How many m2 of forest are accessible for a European citizen? NUTS23_99 CLC00_forest
Integration of environmental and socio economic data Forest area per inhabitant in 2000
Integration of environmental and socio economic data Potential of forest area per inhabitant within a 10 km radius in 2000
How far do we get? • ESPON Database in its current structure is a repository for a huge set of European indicators • Long Term Database relies on a structured schema designed for the import of different kinds of indicators (different sources, different grids, different census times,…) • Two different approaches (from philosophical & technical points of view) • During the development of LTDB, we have experienced that to extract and import some indicators from a data source is not a trivial task (ESPON Database included)
Coupling LTDB and ESPON Database? • The acquisition process (values from the ESPON Database) has to be automated • A ‘wrapper’ dedicated to the ESPON Data Base which will enable the import of incomplete sets of indicators from the ESPON Database into LTDB • This wrapper exploits a meta description of the schema of the imported source of data (in this case ESPON Database) • To update the LTDB with ESPON data, a complete description of the structuration of the ESPON Database is needed • A cooperation between authors of LTDB and ESPON Databases will be beneficial for both tools in the future…
Long Term Data Base & Data Navigator Thank You for Your Attention… Questions? Joël Boulier, Claude Grasland{joel.boulier, claude.grasland@parisgeo.cnrs.fr} Marc Guerrien, Nicolas Lambert {marc.guerrien, nicolas.lambert}@ums-riate.org Jérôme Gensel, Bogdan Moisuc, Marlène Villanova-Oliver {jerome.gensel, bogdan.moisuc, marlene.villanova}@imag.fr
LTDB Schema Temporal name Temporal object with an internal identifier Proximity/similarity measure matrix Temporal value of some indicator for some GU Indicators Temporal code system Reliability of the source Temporal spatial representation Temporal splitting and merging of GU’s Composition relation between GU’s, depends on the hierarchy Code depending on a code system Database or process genealogy of a value Temporal hierarchical organization of GU’s Providing organism
Open Questions • Evolution of the conceptual model of the LTDB? • Terminology (semantic units, spatial inclusion or aggregation, …) • Partial space inclusion: what does it mean? • Data importation from different sources? • Data from different sources will be imported • Needs an interface for facilitating the importation of heterogeneous data (tools developped for HyperAdmin could serve…) • Sources? • ESPON database (BBR) • Corine Land Cover • United Nations • …