1 / 37

LIEGE 2010 ESPON Meeting

This presentation provides an introduction to the ESPON Data Base, including how to query and extract data and metadata sets. Demos are included to showcase the new ESPON Data Base Query Interface and the process of uploading data and metadata sets. The manual checking phase and classification of themes are also covered, along with instructions on how to register into the ESPON Data Base.

bettyj
Download Presentation

LIEGE 2010 ESPON Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIEGE 2010 ESPON Meeting How to Use and Feed the ESPON Data Base?

  2. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  3. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  4. Context The ESPON Data Base is a Web-based application designed and developed by partners of the ESPON Data Base 2013 Project The First phase of this Priority 3 ESPON Project ranges from mid 2008 to February 2011 « The goal of this project is to develop and manage a geo-referenced information system, taking into account the ESPON themes of applied research, their aims and geography to be covered. It will include a comprehensive database to be used within the ESPON 2013 Programme and an additional one to be published on the ESPON website. » TPG_guidance_Scientific_Platform_and_Tools, May 2010

  5. ESPON Data Base 2013 Project in a Nutshell

  6. ESPON Data Base 2013 Project in a Nutshell • 12 Challenges: the core of the project • Collection of Basic Regional Data • Harmonization of Time Series • World / Regional Data • Regional / Local Data • Social / Environmental Data • Urban Data • , 8., and 9. ESPON Data Base Application • Spatial Analysis for Quality Control • Enlargement to Neighborhood • Individual Data and Surveys • Data and Metadata • Metadata are probably more important than Data • Methods • Technical Reports that provide clear solution, identify shortcomings and dead-ends

  7. ESPON Data Base 2013 Project in a Nutshell • Applications • OLAP Program for NUTS to GRID Conversion • Specific Program of Text Mining for the elaboration of the ESPON Thesaurus • Code in R language for outlier detection • ESPON Data Base Application • « The ESPON database combines the data from all projects: raw data, indicators and typologies. • The TPG work related to any of the concepts measured by the indicators and/or typologies should make use of the ESPON database in order to ensure that results between TPGs are comparable. • It also allows easier reproduction of results as the data is available in the ESPON database and can therefore be used by everyone. »TPG_guidance_Scientific_Platform_and_Tools, May 2010

  8. ESPON Data Base 2013 Application • ESPON Data Base 2013 • A repository gathering different indicators • made available for ESPON Projects • provided by ESPON Projects • A Web interface upon this repository and accessible through the ESPON Web site that allows • to download data (and metadata) sets • to upload data (and metadata) sets • About the ESPON Data Base content • See the dedicated Interactive Workshop Session 5

  9. ESPON Data Base 2013 Application • History of the ESPON Data Base 2013 • First version: presented during Malmö Seminar and on line in November 2009 • Some Data and Metadata sets • Data and Medata Sets Formats as Excel Files • A simple Query Interface • Second version: presented during Alcala Seminar and on line in June 2010 • More Data and Metadata sets • A metadata editor build with Geonetwork • A more elaborated Query Interface

  10. ESPON Data Base 2013 Application • Third (and final) version of the ESPON DB 2013 developed during the first phase of the ESPON DB 2013 Project • presented today and on line at the end of December 2010 • improvements until end of February 2011 • What’s new in this version? • More and more Data and Metadata sets • A login/password management interface • A back-office interface for its administration • An upload interface that guides users to enter data and metadata sets • A new Metadata editor • A new and yet more evolved Query (Download) Interface

  11. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  12. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  13. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  14. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  15. Advantages and requirements of an online DB • Allows sharing data among a big community • Allows online data exploration and discovery with a nice interface • Requires the respect of some syntactical rules (computers don’t do well in detecting ALL human mistakes) • false units, false indicators, indicators without values, etc. • Requires the respect of some semantic rules in order to avoid (these are VERY difficult to detect automatically) : • ambiguity (different entities of the real world appear as one in the database) • duplication (one entity of the real world corresponds to several entities in the database)

  16. Problems before the metadata editor V2 • Syntactic issues • Bad format dates (e.g. “06/2009”) • Bad format indicator values (“ 634.7”) • Bad format Booleans (mix of “TRUE”, “FAUX”, ”YES”) • Modified names for metadata fields • Alien, non-data or non-metadata text (leaving some of the comments or just forgotten copy-paste results), visible or HIDDEN • E. g. on a data file with 120+ columns, 2 hidden columns in the middle with territorial unit names

  17. Problems before the metadata editor V2 • Syntactic issues • End of paragraph symbols (carriage return) in names • Adding other metadata/data items not required by the profile • Changing the order of metadata fields or changing the order of the data columns • These are usually not very difficult to find and correct (MANUALLY), but correcting them is time consuming… • They sometimes intersect with other software bugs and spawn new types of errors…

  18. Problems before the metadata editor V2 • Syntactic issues • Lack of correspondence between the dataset and the metadata file: indicator code and label code

  19. Problems before the metadata editor V2 • (2) Semantic issues • Using various, unofficial names for projects, organizations, territorial units codes makes cross identification between multiple files or multiple deliveries complicated and results in duplicated objects in the database • Incorrect descriptions for measure units • “Inhabitants per km2”, “MIO euro” • “number of employees” instead of “employees” • “%”, “index” or “ratio” instead of “ none” (indicator methodology)

  20. Problems before the metadata editor V2 • (2) Semantic issues • Hasty copy/paste between indicator name and description (no new information) • Decreasing precision in the metadata as we advance in the dataset • No more methodology description • Indicator name and description become the same • Description is too short to understand what the indicator is about • Wrong indicator values

  21. Problems before the metadata editor V2 • (2) Semantic issues • Ambiguity and duplication can easily decrease the value of a big database • Ambiguity: E.g. same name for different indicators • Duplication: • E. g. different name for the same indicator (“Total population”, “Average annual total population”, “Absolute population”, “Population male and female”,… or worse “Espon_Project_XX_A01”)

  22. Problems before the metadata editor V2 • (2) Semantic issues • Misunderstanding between the indicator methodology and the estimation methodology • Indicator methodology - general part • Population is a count variable, GDP is ratio between GDP/pop_t, etc. • Important for new/complex indicators (typologies, indexes, etc.) • Value methodology – specific part • What methods of estimation/correction were applied (interpolation, adjusting with higher NUTS, etc.) and their approximate reliability

  23. Solutions • For syntactic errors: • Automatic error checking at upload time • Assistance for the user (highlighting the faulty fields) • For semantic errors: • Changing metadata filling from an “editing algorithm with a lot of copy-paste” into an “editing algorithm with a lot of browse and pick” • Some metadata values that are already known can be automatically filled (like contact coordinates) • Already existing indicators are classed (by our UL colleagues) into themes and are (unambiguously and uniquely) coded based on methodology – indicator ontology • Already known geographic objects are stored in a spatial ontology and “alien” units may be rejected • Outliers may be detected

  24. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  25. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  26. Suggestions for improving (meta)data quality • (1) Difficulty to fill and to use of a data delivery containing hundreds of indicators • Class by order of importance and interestingness the data in your delivery • Provide a data/metadata files titled 10_best_indicators_Project_X”. • Provide another data/metadata files entitled “Database_Project_X” for the other indicators

  27. Suggestions for improving (meta)data quality • (2) Importance of the methodology field of the indicator • Integrating an indicator in the ESPON DB is interesting if it is possible to re-use it for other purposes. • In particular for complex indicators (typologies, outputs of models) to understand what is behind the calculation • Avoid mentioning ONLY “cf. Final Report of XXX Project for further information”, of little use in an online database

  28. Suggestions for improving (meta)data quality • (2) Importance of the methodology field of the indicator • Example: The Labour Market typology (ESPON 2006 Database). What is behind the typology?? ESPON 2006 Database metadata ESPON 2013 Database metadata (filled thanks the Report of the project)

  29. Suggestions for improving (meta)data quality • (3) The dataset has to cover at least the entire ESPON Area (EU27+4) and if possible Candidate Countries. • (4) In case of indicators described in the NUTS delineation, try to provide the information in the different NUTS level (NUTS0, NUTS1, NUTS2 and NUTS3 if possible) • (5) Mention systematically in the dataset the NUTS version of the territorial units 3 4 5 Example of metadata file

  30. Suggestions for improving (meta)data quality • (6) In case of territorial units which do not belong to the NUTS nomenclature, provide the ESPON Database Project with a precise nomenclature with names and if possible shapefiles to locate the territorial units. Shapefile and dataset describing UMZ nomenclature, version 2000 (provided by ESPON Database Project)

  31. Suggestions for improving metadata quality • (7) If you estimate missing values for basic indicators, mention it in the label and explain the methodology used for filling the gaps in your dataset. Label for estimated data In Bulgaria (ESPON 2013 Database Basic indicators)

  32. ESPON DB operational indicator ontology • Purpose: • To allow obtaining unique and unambiguous codes for the indicators in the database (database problem) • To allow a classification of indicators, easing data discovery and exploration (user problem) • Idea for the ESPON thesaurus: • Melting together several classifications (themes, subthemes) • Producing a synthesis of these classifications • One indicator can belong to one or more themes and subthemes • The description of the indicator is further enriched by adding keywords

  33. ESPON DB operational indicator ontology • Idea for coding scheme: • Use the methodology of the indicator as a basis for creating an abbreviated code • Leave aside everything that doesn’t relate strictly to the indicator (like spatial, temporal or resolution descriptors) • An indicator code is composed of several parts • Base indicator part (GDP, pop) • Restrictions, derivations, methods of calculation (m, av, ch) • Level of measurement (density ratio, count, etc.)

  34. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  35. Outline • Introduction • How to Query the ESPON Data Base (… and Extract Data and Meta Data Sets)? • Demo (part one): The new ESPON Data Base Query Interface • What’s in a Data Set? What’s in a Metadata Set? • Metadata: some feedback • Demo (part two): How to Upload a Data Set and a Metadata Set? • The Manual Checking Phase and Classification of Themes • Demo (part three): How to Register into the ESPON Data Base? • Conclusion

  36. Agenda • 31st Dec. 2010 • The new ESPON DB 2013 Web Application on line • Test and Survey • 28th Feb. 2011 • Final Report of the ESPON DB 2013 Project • End of the First Phase of the ESPON DB 2013 Project (closure of the scientific activities)

  37. LIEGE 2010 ESPON Meeting Thank You for Your Attention

More Related