620 likes | 648 Views
Data Management in the GLOWA Volta Project. Data Management and Application of GIS and Remote Sensing in Natural Resources Management Training Workshop. Wednesday, December 12 – Friday, December 14, 2007 DGRE, Ouagadougou, Burkina Faso.
E N D
Data Management in the GLOWA Volta Project Data Management and Application of GIS and Remote Sensing in Natural Resources Management Training Workshop Wednesday, December 12 – Friday, December 14, 2007 DGRE, Ouagadougou, Burkina Faso Data Management in theGLOWA Volta ProjectAntonio Rogmann Center for Development Research (ZEFc)University of Bonn
Data Management Content • Data Management Problems, Solutions and Challenges • Data Management Workflow • Data Management Infrastructure Concept Components and Interfaces
Data Management: Problems Survey with GLOWA Volta Partner Institutions and Stakeholders at the PARTNERS’ CAPACITY NEEDS ASSESSMENT WORKSHOP (31.05.-01.06.2007, Accra, Ghana) For understanding • coherences between the institutions in terms of data exchange / flows related to water management • data environment: software/models in use, data storage and access facilities, hardware • defined set of problems in managing (getting access to) data as condition for • adjusting the data management system of the GLOWA Volta Project to the requirements of the partners • offering solutions to increase the quality of data management to the partners
Consequences of lack of data management Institutions participating in the survey: • Coalition of NGO's in Water and Sanitation (CONIWAS) • Kwame Nkrumah University of Science and Technology, Kumasi (KNUST) • Soil Research Institute, Council for Scientific and Industrial Research (SRI) • Water Research Institute, Council for Scientific and Industrial Research (WRI) • Hydrological Service Department (HSD) • Water Resources Commission (WRC) (2 participants) • Hydrological Service Department (HSD) • Ghana Irrigation Development Authority (GIDA) • Water Research Institute, Council for Scientific and Industrial Research (WRI) • Ghana Water Company Ltd, Head Office (GWCL) • Dept. of Agriculture Economy & Agriculture Business. College of Agric. and Consumer service • Environmental Protection Agency (EPA) • Centre for Environmental Impacts Analysis (CEIA) • Volta Basin Development Foundation (VBDF) • Training, Research Network for Development (TREND) • UDS: Faculty of Integrated Development Studies • Savannah Agricultural Research Institute (SARI) • Volta River Authority (VRA)
Data Management: Problems Results • Lack of information about data = data about data = metadata based on survey squestionnaire. Number of participants = 19.
Data Management: Problems Results • Documentation of data mainly on internal digital catalogues (e.g. Excel-Tables), on papers or completely without documentation • Web-based and searchable meta database as exception based on survey squestionnaire. Number of participants = 19.
Data Management: Problems Results • Data transfer is copious and time consuming • Sending data by E-mail causes problems because of data volumes and transfer times based on organizations represented in the questionnaire participants. Multiple choice. N = 19.
data data Organization data Institution Service Department Data user Data Management: Problems Institution data Data user Service Department data Organization data Common questions encountered when searching for data: • Which data exist that can serve my research / decision / information requirements? • Where are the data available? • How can I get the data with little effort? • What are the formats of the data? Are they compatible with my applications / models? • What are the data characteristics (e.g. time steps, units ...)? • Who owns the data? Are there costs? ?
Data user Solution: Data Management Institution data data Service Department data To solve these problems the GVP would like to offer you: • a centraly hosted database which provides • access to the GVP datastock • the option to extent the datastock with your own data • a centrally hosted metadatabase giving • information about data needed • references about data providers • a geo portal informing • about projects related to water management in the Volta-Basin • and their data: in a spatial visualization ?? Web Geoportal Meta data !! Organization (Hoster) Map Server Data Server GVP data Meta data
Project data: what can GVP provide? Hydrological data: water discharge, groundwater (time series) ... Climatological data: precipitation, temperature, air humidity, evapotranspiration, heat flux (time series and forecasts) ... Water use data: agricultural (irrigation) / domestic / industrial (hydropower) / reservoirs … Land use / land cover data: agriculture, urbanization, soil, geology, vegetation ... Topographic / infrastructure / administrative (basic) data: river networks, lakes, elevation, roads, settlements, electricity, boundaries .... Socio–economic data: demography, census data, economic activities (markets), surveys ... in several formats: vector / raster data (remote sensing), tables, documents, model specific formats ... Data
Data Management is the holistic background in which data access facilities are embedded Data Management in an organization is based on a variety of methods for Data description (meta data) Data organization Data quality assurance Data access and distribution Security Solution: Data Management
Data Management in an organization is practically based on Standards global standards e.g for metadata, resource identification, formats ... internal standards according to a concensus inside the organization e.g. database models, file naming, data policy .... Workflows / Process Steps / Responsibilities Technology: hardware, software, interfaces ... data infrastructure Solution: Data Management
Metadata Standards: several standards developed by standardization organizations like Federal Geographic Data Committee‘s (FGDC) standard ISO 15119 for geodata registered by the International Organization for Standardization (IOS) consisting of a range of elements/fields to describe resources (data, software, services) some metadata standards partly consist of several hundreds of elements Data Management: Metadata
Metadata Standard in the GVP: Dublin Core (DCMES) core of 15 elements, extended by some special elements for geodata all elements, except titel and identifier, are optional understandable element description every kind of resource (data, software, model, …) can be described Searchable Metadata elements like „Subject“: topic will be categorized using keywords, key phrases, or classification codes „Publisher“: an entity (institution, person) responsible for making the resources available „Format“: the file format, physical medium, or dimensions of the resource Data Management: Metadata go to manual
Creating metadata Metadata should be stored in a metadatabase hosted in a central place providing web-based access and search interfaces to data and resource descriptions Metadata can be created in two ways: online:direct entry of metadata into a central metadatabase using internet browser, java script, php offline: using an internet browser and a java script, storing each metadataset locally and close to the described object in a XML-file if metadata XML-files have been created offline a metadata harvester can collect and insertlocal files automatically into the metadatabase on a server the metadata-files can be uploaded to the central metadatabase than Data Management: Metadata go to manual
Data Management: Metadata * Developed as prototype by Dr. Marcel Endejan, Deputive Executive Officer, GWSP in Dissertation Metadata input mask obligatory metadata elements • Input Mask* for • creating metadata-sets as XML files • entering metadata in to metadatabase (web / LAN) opens URN mask metadata elements input field go to manual
Metadata input mask Data Management: Metadata metadata elements insert button go to manual
Internal file description for structured data (e.g. measurements): data file-headers giving information about content, units in use, instrumentation, quality of values, location, ... apart from metadata important information are provided to the data user / recipient multiple similar files / data sets can be described in the first sheet (e.g. of an excel file) the first file of a file set (referenced by others) separate text file stored close to the files Data Management: Internal Description
Basic determination of data categories qualitative data:data which are rich in detail and description, usually in a textual or narrative format, e.g. case studies, document reviews, … quantitative data:numerical data. Data which are measured on either the ratio or interval scale of measurement, e.g. temperature, water level, … Naming of data (recommended particulary for quantitative data) should reflect (Example:) hyd_waterlevel_ghana-kaburi_020101-020630_v1.xls Data Management: Naming Convention DisciplineTopic Site Time Frame version • Change of current naming systems is not necessary, but necessary is…
Identification of resources (data, documents, maps) … an unique identifier for each resource as a central metadata element. GVP uses the Uniform Resource Name (URN) quasi-standard for identifying resources in information systems. Example: International Standard Book Number (ISBN) can be used as name for resources (e.g. file name) has to follow an standardized syntax URN in the GVP will be generated easily using a resource name generator (internet browser) Data Management: Identification
Identification: URN standardized syntax: ‚urn:‘<NID>‘:‘<NSS> Data Management: Identification NID = Namespace Identifier representing an organisation, project, network, person urn:x-gvp:uid:<NSS> urn = uniform resource name x = experimental, not officially registered gvp = glowa volta project uid=user identification
standardized syntax: ‚urn:‘<NID>‘:‘<NSS> Data Management: Identification NSS = Namespace Specific String encoding information about the „type“, „use“ and „storage medium“ of the resource / data urn:<NID>:<resType>-<resSubType>.<sTitel>.v<verNr>.<for>.<med> <resType> = type of resource, e.g. dataset, document, software <resSubType> = subtype o.r., e.g. primary / secondary data, model input <sTitel> = short titel, name <verNr> = versionsNumber <for> = format <med> = medium on which resource / data file is stored
Example: urn:x-gvp:HD12:ds-pd.waterlevel_gh-kab_020101-020630-v1.0.xls.cd Data Management: Identification gvp = Glowa Volta Project HD_12 = Institution e.g. „Hydro Service“, editor e.g. 12 = person xy ds = dataset pd = primary data waterlevel ... 30 = short titel e.g. abbreviation for „hyd_waterlevel_ghana- kaburi_020101-020630” v1.0 = version of dataset, e.g. raw data in 1st version (uncontrolled) xls = MS Excel CD = on CD
Creating of URN‘s Using the Resource Name Generator* creates URN‘s using a simple Java Script application running in an internet explorer currently existing as prototype * Developed as prototype by Dr. Marcel Endejan, Deputive Executive Officer, GWSP in Dissertation Data Management: Identification
Resource Name Generator integrates special codes for resource types within a network that shares data the resource types have to be identified before modeling the URN-Syntax …. … and integrated into the script Data Management: Identification Resource Sub-Type Resource Type
Resource Name Generator* version, format and storage medium is selectable copy and paste URN into the name of the dataset (if required) and enter it into the metadata individually URN‘s will be adjusted to the central metadata base, in which the data will be registered and described * Developed as prototype by Dr. Marcel Endejan, Deputive Executive Officer, GWSP in Dissertation Data Management: Identification Format Version Number URN Storage Medium to avoid duplicates
Formats data can be stored in “proprietary” or in “non-proprietary” formats proprietary format encodes data in a such a way, that the file can only be opened with the softwarewhich was used to generate the data non-proprietary formats can be used by a wide range of applications (mostly using import functions) and platforms, increasingly infuture data has to be stored for a long period of time and it is not sure which programs will be used in future interoperability between different software applications has to be provided as long as possible Data Management: Formats
international certified standards like the ISO standard “Open Document Format for Office Applications” (ODF), “HTML”, “XML” or OGC’s “GML” (Geographic Markup Language - Open Geospatial Consortium) some formats are de facto-standards (like MS Excel) because the proprietary programs are applied by many users processing software widely used by the members of a data exchange framework have requirements in respect to input formats Data Management: Formats
Conclusion: try to use non-proprietary exchange formats as far as possible and consider the format requirements of software in use Examples: Microsoft Word (.doc) Rich Text Format (.rtf), Open Document Text (.odt) MS Excel (.xls) Comma Separate Value (.csv), Extensible Markup Language (.xml) ESRI shape Geographic Markup Language (GML) Recommendation: use open office software like OpenOffice.org in his functionalities similar to Microsoft Office (incl. Excel, Access, etc.) format is ISO-Standard since 2006 (ODF - ISO/IEC 26300)! no costs! Data Management: Formats
Security Warranty to avoid unallowed access and missaplication of data and resources Use of computing security facilities as Authentification Control Lists (ACL) secure access channels like Secure Shell (SSH) technology Data Management: Security
Data Access Control data might have produced costs in creating, are not in the public domain, still not published, …. data access control is based on an agreementwithin a (scientific) community of data producer, data user and data provider in terms of data access rules Means: who (user, user groups) is allowed to use (get) which data under which constraints (owner rights, payment) how to organize the authentification prozess schematically user groups with graduaded access rights how to implement the authentification process on a technical level Data Management: Access Control
Quality assurance for data Data Quality means: the state of completeness, validity, consistency, timeliness and accuracy that makes data appropriate for a specific useusing computing facilities In a comprehensive view provided by data management as subordinate concept Software-based methods linked with specific scientific disciplines have to be transparent and comprehensible should be declared (recommended) within a scientific or administrative network level of quality must be described within a data file, within the metadata.... Data Management: Quality
Quality assurance for data in the GVP Is done by the scientists within their own discipline in their responsabilty Test by diagramms, if data are consistent Comparisons with other data sources Routine recalibration of instruments Program limit checks Basic statistics Data Management: Quality
Getting benefits from data management requires the effort of all participants DM needs firm agreements with regard to standards selected data user and their capabilities in accessing and using data technical environment as software (-interfaces), network protocols, etc. personal and / or institutional responsibilities within a .... ... data management workflow: data production quality control naming, identifying description transfer to data host delivery from data host DM requires the willingness to invest time and to hold the standards!! Data Management: Challenges
Next slides are part of an digital GVP-data-management-workflow manual and documentation Will be completed and published at the beginning of 2008 Background for the next training sessions for web-based data management and (geo)database administration Workflow Manual will be offered in a similar design but in different formats (PDF, HTML), thus it can be delivered or published within the web Data Management Workflow It serves as a good practice in the GVP, but has to be extended for fitting further requirements to the system from stakeholders side - after the GVP!!
Data Management Workflow 1 2 3 4 7 6 transfer workflow steps (linked) 1 5
Data Management: Workflow Steps Recommendations ... in note form Step 1: Data Collection Processes • survey • data logger download • surveying & mapping • Location • field • site • Processor • scientist • planner • data collector • Software / Interfaces • file explorer • download interface • GPS-Tracking • data processing software • hardcopies • Hardware • Data Logger • Lap Top • GPS • ... • Take notes in a log book about • measurement device: name, manufacturer, serial number • date: when has the data been collected • name of the person who collects the data in the field • what has been done: maintenance, calibration • particularities: could anything special be observed? • GPS measurements and mappings • choose the appropriate Coordinate System for the spatial working area • for Ghana Coordinate System WGS1984 projected in UTM (Zones 30/31N), (Burkina Faso 30/31P) Back to overview
Data Management workflow steps Recommendations ... more to this topic in note form Step 2:Quality Control Processes • searching for gaps, outliers, file damages • deleting data errors • filling gaps • documenting • Location • field • site • office • Processor • scientist • data collector • Software / Interfaces • statistical methods (algorithm) • data processing programs (e.g. HYDAT) • Hardware • Lap Top • PC • workstation • Data Quality Assurance • Julia, Uli bsphft. methods • Documentation • which uncertainties are still given • what was done for quality control • specific algorithms and software used • note it in the meta data • note it in table headers Back to overview
Data Management Workflow Steps to consider …. more to this topic in note form Step 3: Naming, URN Processes • designing of an appropriate name syntax • naming of resources • crating of URNs • Location • office • Processor • scientists • planners • database administrator • Software / Interfaces • file explorer • Internet Explorer • html, Java Script • Hardware • Lap Top • PC • workstation • data name reflecting • topic of content • spatial and temporal coverage • status of processing (version) • local data sharing (e.g. office with network) • find an agreement about file name syntax GVP-Standard? • identify resource / data types to define an URN Syntax GVP-Standard? • assign an Uniform Resource Name • use the Resource Name Generator • store URN within the data sets • store URN in local data catalogue • store URN creating metadata Back to overview
Data Management Workflow Steps Recommendations ... more to this topic in note form Step 4: organization of data Processes • designing of an appropriate storage structure (directories) on file system • Location • office • Processor • scientists • planners • network administrator • Software / Interfaces • file explorer / manager • Hardware • Lap Top • PC • LAN (Server) • Directory Structure • especially important • when data or resources are shared within an office community • within small Local Area Networks (LAN) • within peer-to-peer network • can be concepted focussing on • data processing framework (models etc.) • project structure (subprojects project hierarchy) • spatial, temporal or thematic content of datastock (e.g. by regions, themes..) • should be matched on local drives by all participants of the network - adjusted to personal focal points in work • makes easier to find resources Back to overview
Data Management Workflow Steps Recommendations ... more to this topic in note form Step 4: organization of data Processes • insert information about data into a data dictionary • Location • field • site • office • Processor • scientist • planner • Software / Interfaces • Excel • OpenOffice Calc • Hardware • Lap Top • PC • workstation • Data Catalogue • a small table file with registration of own data, scripts, etc. on local drives • provides overview and saves time • minimum elements should be: • Uniform Ressource Name (URN) • Titel / Name • Short Description • Format • Storage Location (path) • Example from GVP Back to overview
Data Management Workflow Steps Recommendations ... more to this topic in note form • table header with details to • Unified Resource Name: [‚urn:‘<NID>‘:‘<NSS>] • Data provided by: [surname, first name, email-address, institution] • Location: [name of location, UTM coordinates (X,Y)] • Elevation: [m above sea level] • Measuring Design: [description of applied methods] • Measurement Executer: [name, (project, institution)] • Measuring period: [JJJJMMDD – JJJJMMDD, time steps (d/h/s, Minutes)] • Missing values: [-9999.9] • Quality: [description of quality assurance methods] • Notes: [remark] • table header with description of parameters in use • explain the meanings of abbreviations / codes • declare the units used within the parameters if not self-explaining • use informations from data collection log book Step 4: organization of data Processes • insert dataset information directly into or closely to the file • Location • field • site • office • Processor • scientist • planner • Software / Interfaces • processing software • file explorer • Hardware • Lap Top • PC • workstation Back to overview red font = metadata elements (if metadata file just created this ones are not necessary as table header!)
Data Management Workflow Steps more to this topic data file header: example Back to overview
Data Management workflow steps to do … more to this topic in note form • Metadata • at latest if data is going to be published, it should be described by entering metadata • use the internet browser interface (as described here) for entering metadata • try to fill out as much elements as possible • the accurate use of keywords in element “subject and keywords” is very important • most queries to metadata address “subject and keywords” as well as “spatial coverage” Step 5: description, create metadata Processes • description of data / resources following metadata standard • Location • office • Processor • data producer • Software / Interfaces • internet browser • html, java script • Hardware • Lap Top • PC • workstation Back to overview
Data Management Workflow Steps to consider …. more to this topic in note form • Metadata • don’t forget to give access information about the data / resource • current location: where the resource can be retrieved • access modalities (costs, user rights, technical way of retrieving, etc.) • if data are not transmitted to central host: local contact person • Metadata storage files • if direct input to metadata base is not possible (no internet connection): XML-metadata files are to be sent to the administrator of the central metadatabase e.g. on CD by postal service • Data and metadata • metadata only have to be created, if the further use of resources by others is due Step 5: create metadata Processes • description of data / resources following metadata standard • Location • office • Processor • data producer • Software / Interfaces • internet browser • html, java script • Hardware • PC • workstation Back to overview
Data Management Workflow Steps to do …. more to this topic in note form • Make a decision • if datasets or resources (e.g. software, models) should be shared • who - persons, institutions, partners - should have access to the data • if there should be a payment for data sets • where the accessible datasets should be stored: locally or on a central server • who is the responsible person controlling the transmission to a central database. This person in charge has to control if • the resources/datasets meet the data management standard of the community • particularly if the data have proper metadata including clear definition of use rights ( provide database administrator with a list of potential user groups) Step 6: (preparing to) transfer Processes • decision making to • publishing of data • access constraints (user) • transmission to central database • Location • collective institution • local offices • Processor • data user framework • database administrators • Software / Interfaces • Hardware Back to overview
Data Management Workflow Steps to do …. more to this topic in note form • Preparing the transfer • reformat the data sets, if required • inform central database administrator • which datasets are going to be uploaded to the central database and why • that metadata are entered directly into the metadatabase using the web interface • that metadata files are transmitted together with the datasets • Do the transfer • upload the data to a “transfer” directory on the main server • use upload software based on ftp (file transfer protocol) or SFTP (Secure Shell - File Transfer Protocol) if facilities are given • GVP uses SFTP for data transferring to the Data-Server • if upload is not possible because of slow internet connection, send data by postal service on CD / DVD Step 7: transfer Processes • formatting • upload to central database • Location • local office • central database host • Processor • data producer • central database administrator • Software / Interfaces • data processing software • html, java script • SSH (e.g. winscp) • Hardware • PC • Server Back to overview
GVP-Data Infrastructure GLOWA Volta HP Metadata- Interface xml/xsl request to download Intranetzone Internetzone Webserver (VM) Datenserver (+RAID) File System (Samba) Apache MySQL/ Postgres: Meta-DB Portal-DB ESRI- Geodata- base PHP Mapbender inkl. PostgreSQL JavaScript (CGI) ADODB :1521 File MapServer CGI Meta.dc.xml Catalog-Manager inkl. phpMyAdmin PHP, DOM describes JDBC :1521 SMB lokal/offline SMB SMB SMB, JDBC Tomcat JSP/ Java Portal Java-based Client(COBIDS) ESRI ArcGIS Clients ArcGIS Client
GVP-Data Infrastructure Don‘t feel shocked, that‘s technical stuff, let‘s look at it from the user‘s side
Institution data data Data user Service Department data GVP-Data Infrastructure Harvesting the fruits: • a centrally hosted database • giving access to the GVP datastock • with the option to extend the datastock with your own data • a central hosted metadatabase giving • answers about data needed • references about data providers • a geoportal informing • about projects related to water management in the Volta Basin • and their data: in a spatial visualization Web Geoportal Meta data !! Organization (Hoster) Map Server Data Server GVP data Meta data