Information Management System PAK (Data Warehouse)

Information Management System PAK (Data Warehouse). R. Rollenbeck Universität Marburg. The purpose of unified data collection: the problem. Research data is lost after end of projects Research data can not be used by third parties Integrative analyses are impaired

Information Management System PAK (Data Warehouse)

  1. Information Management System PAK (Data Warehouse) R. Rollenbeck Universität Marburg

  2. The purpose of unified data collection: the problem • Research data is lost after end of projects • Research data can not be used by third parties • Integrative analyses are impaired • Quality of data can not be assessed Basic introduction PAKdw – Rütger Rollenbeck

  3. The purpose of unified data collection: the solution • Implementation of standardized data formats • Clear and concise documentation and publication of data formats • Definition of common data properties • Documentation of collection and processing methods Basic introduction PAKdw – Rütger Rollenbeck

  4. ecological community project global www Dublin Core core elements - ID - tech. data - description of content - persons and rights - life cycle • EML • basic modules • Metadata • Citation • descriptions PAKdw relational database that applies the EML framework in a research unit specific extent is compliant to is compliant to How do we do it? Implementation of standardized data formats: -> Hierarchical structure in accordance with EML EML: Ecological Metadata Language (http://knb.ecoinformatics.org/software/eml/ ) • standardized language to describe ecological data sets (->Metadata) • XML‐based scheme • Inspired by the “Longterm Ecological Research” (LTER) community and developed by the “Knowledge Network on Biocomplexity” Basic introduction PAKdw – Rütger Rollenbeck

  5. How do we do it? And what is behind... Clear and concise documentation and publication of data formats: Basic introduction PAKdw – Rütger Rollenbeck

  6. resource - intellectual rights - access - ... - keywords - distribution - additional Info - shortname - pubdate - creator - purpose - contact -1...n entities - [changeHistory] - ... dataset entity dataTable spatialRaster/[Vector] otherEntitiy - ... - ... - ... - ... -each linked to a method (sampling/processing) - 1...n attributes (variables) - storageType - numeric - nonnumeric - measurement scale attribute - name - description values ID A B 1 123 NULL 2 123 3 12 4 5 321 How do we do it? Definition of common data properties: Basic introduction PAKdw – Rütger Rollenbeck

  7. party - organization -... - person -... project - title - description - personnel -... coverage - temporal - spatial method - 1...n method steps - measurement description - instrumentation - citation link How do we do it? Documentation of collection and processing methods: Basic introduction PAKdw – Rütger Rollenbeck

  8. party - organization -... - person -... project - title - description - personnel -... coverage - temporal - spatial method - 1...n method steps - measurement description - instrumentation - citation link - keywords - distribution - additional Info resource - shortname - pubdate - creator - intellectual rights - access - ... dataset citation - author - citation type (each with specific fields) - article - thesis - chapter - book - presentation (poster / oral) - purpose - contact -1...n entities - [changeHistory] - ... entity (each with specific fields according to its type) dataTable spatialRaster/[Vector] otherEntitiy - ... - ... - ... - ... -each linked to a method (sampling/processing) - 1...n attributes (variables) - storageType - numeric - nonnumeric - measurement scale attribute - name - description values ID A B 1 123 NULL 2 123 3 12 4 5 321 EMLin the PAKdw When? Where? Who? How? What? Basic introduction PAKdw – Rütger Rollenbeck

  9. Achievements - Data and Information Management ~5 Mio single Data Records, ~550 registered users Information management system PAKdw - technology Project management and coordination Data management and description (EML) User management • upload interface • maintenance interface • access interface • search modules • visualisation module • basic analysis module • public online presence • news system • travel accounting • mailing list generator • station booking Project structure Rights management Publications Data sets 9 Lotz et al. (2012)

  10. The new webpage of PAK Basic introduction PAKdw – Rütger Rollenbeck

  11. Old and new Information Basic introduction PAKdw – Rütger Rollenbeck

  12. PI´s tasks Complete Project Informations! Basic introduction PAKdw – Rütger Rollenbeck

  13. Live Demos • Edit Project Information • Edit staff • Review data and Publications • Data upload procedure • Station booking Basic introduction PAKdw – Rütger Rollenbeck

  14. Live Demo • Data upload procedure Current data flow from upload to use Basic introduction PAKdw – Rütger Rollenbeck

  15. The data creators tasks • Data collection • Standard procedures & documentation • Checklists • Quality assessment • Secure & rapid transport • Backups! • Format conversion • Column-Header = Attribute name • List separator = comma (,) • Quality flags if appropriate • Automated conversion recommended • Upload • Review attribute tree • Supply comprehensive Metadata Basic introduction PAKdw – Rütger Rollenbeck

  16. Current & upcoming events: • Data base workshop • Feb 2014 in Marburg • October 2014 in Ecuador • Helpdesk permanently available via email • Public access to old FOR816 & 402 Data • Platform data remain internal! • Data not yet analysed remain internal • Data user agreement: old type or Template from the exploratories? • Bookingtools for Cajas and Laipuna • Independent systems but interlinked • Booking tool for car use in planning phase • The new Plotsystem: • Citation of Designgroup desired • Unified naming system for Plots Basic introduction PAKdw – Rütger Rollenbeck

  17. Plot naming convention(Proposal) Format: Area natural/disturbed number_Project_Activity number Cajas NP: CAnx Bosque Mazán: BMnx Podocarpus NP and San Francisco reserve: SFnx, SFdx Laipuna reserve: LAnx, LAdx SFn1_C06_BP_1 SFn1: San Francisco, natural, Plot1 C06: Projectnumber BP_1: Bird Pollination 1

  18. Data user agreement: The main points • Property Rights • Data use agreement signed when entering the data base • Metadata freely available; data owner might be contacted • Data restricted to researchers; open access after end of program • If researcher uses data, its owner has to be asked, particularly for publication • Forwarding data to third parties is prohibited • Of course, no control by DB manager possible, voluntary Vortrag...

  19. Current & upcoming events: • Ecuadorian PI´s gain access to datawarehaouse • Patricio Crespo Sánchez, UC • Carlos Espinosa, UTPL • Elizabeth Guzman, UTPL • David Siddons, UDA • Juan Pablo Suarez, UTPL Will be handled internally as subproject with a subset of access rights Basic introduction PAKdw – Rütger Rollenbeck

  20. Current & upcoming events: • Setup & Implementation of Ecuadorian data base system • Universidad de Azuay: • Data base Manager David Siddons, implementation executed by Diego Pacheco • Hardware available, software setup under development • I will assist the setup in Cuenca • Universidad Tecnica Particular de Loja • Nelson Piedra will adapt the Web interface for ecuadorian needs • Juan Pablo Suarez as supervisor • I will assist the setup in Loja 20 Basic introduction PAKdw – Rütger Rollenbeck

  21. Projected Data Flow (C12, UDA, UTPL) Joint metadata standard (EML) and exchange protocols User Platform scientists Stakeholder (e.g. MAE) Metadata search & data transfer Joint attributes – ontology Data standards  upload Browser Data transfer • Metadata portal for platform-wide • data (UTPL) C12 Webinterface Data query & analysis tools DFG DB (C12) (U Marburg) SENESCYT DB (UDA) FORAGUA via GPL Data bases NCI DB Gestion DB ETAPA DB Data provider DFG program SENESCYT program FORAGUA NCI Gestion ETAPA non-university cooperation partners university partners based on FOR816DW technology 21

  22. Current & upcoming events: • Automatic online monitoring of climate stations • Hardware test currently running in Marburg • Software under development • Testinstallation in Ecuador in March • Succesive installation in the following month by ecuadorian technician from the UTPL Central Server (Loja) calls stations each five minutes via cellphone-line Climate stations with solar-powered GSM-Modem Basic introduction PAKdw – Rütger Rollenbeck

  23. Current & upcoming events: • Contract with Thorsten Peters: • Maintains climate stations • Collects and controls raw data • Integration of climate stations of FORAGUA and Gestion Zamora starting in Q3 of 2014 • Integration of climate stations of ETAPA Universidad de Cuenca starting in Q4 of 2014 • Continued support of RadarnetSur. Main goal: Calibrated realtime precipitation maps for all of South Ecuador 23 Basic introduction PAKdw – Rütger Rollenbeck

  24. Thanks for your attention

