1 / 29

Data Management Plans: A good idea, but not sufficient

Data Management Plans: A good idea, but not sufficient. Outline. Why are Data Management Plans good but insufficient? From Data to Process Management Plans How to capture process & context? Summary. Sustainable (e-)Science. Data is key enabler in science

strawbridge
Download Presentation

Data Management Plans: A good idea, but not sufficient

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management Plans:A good idea, but not sufficient

  2. Outline Why are Data Management Plans good but insufficient? From Data to Process Management Plans How to capture process & context? Summary

  3. Sustainable (e-)Science Data is key enabler in science Basis for evaluation and verification Basis for re-use Basis for meta-studies Safeguarding investment made in data Need to preserve and curate the data Preservation: keeping useable over time fighting mostly technical & semantic obsolescence How to avoid data being lost after projects end?

  4. Sustainable (e-)Science Data Management Plans as integral part of research proposals Need recognized by researchers, funding bodies,… Focus on Data Descriptions Declarations of activities to ensure long-term availability of data Data Management Plans are good, but not sufficient! https://dmp.cdlib.org/ https://data.uni-bielefeld.de/de/data-management-plan https://dmponline.dcc.ac.uk/

  5. Data Management Plans Short, free-form text, requiring human interpretation Declarations of intent Not enforceable, hardly verifiable (Burden remains with researchers / institutions, who need to become data management experts) Focuses solely on data, ignoring the process:pre-processing, processing, analysis Limits availability of data & results verification of results, re-use and re-purposing http://rci.ucsd.edu/_files/DMP%20Example%20Cosman.pdf http://deepblue.lib.umich.edu/bitstream/handle/2027.42/86586/CoE_DMP_template_v1.pdf?sequence=1

  6. From Data to Processes Excursion: Scientific Processes

  7. From Data to Processes Rhythm Pattern Feature Set extracts numeric descriptors from audio basically 2 Fourier Transforms some psycho-acoustic modelling some filters (gaussian, gradient) to make features more robust Used for music genre classification clustering of music by similarity retrieval Implemented first in Matlab, then in Java both publicly available on website same same but different...

  8. From Data to Processes Excursion: scientific processes set1_freq440Hz_Am11.0Hz set1_freq440Hz_Am12.0Hz set1_freq440Hz_Am05.5Hz Java Matlab

  9. From Data to Processes Excursion: Scientific Processes • Bug? • Psychoacoustic transformation tables? • Forgetting a transformation? • Diferent implementation of filters? • Limited accuracy of calculation? • Difference in FFT implementation? • ...?

  10. From Data to Processes http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0038234

  11. From Data to Processes To sum up: Data is the fuel for scientific processes is the result of scientific processes Curation of data thus needs to consider these processes Data Management Plans are data centric put too little focus on the processes associated with data are written by humans for humans

  12. Outline Why are Data Management Plans insufficient? From Data to Process Management Plans How to capture process & context? Summary

  13. Process Management Plans Process Management Plans (PMPs) Go beyond data to cover research process: ideas, steps, tools, documentation, results, … data is only one (important) element, commonly actually a result of a research (pre-)process Ensure re-executability, re-usability Must be machine-actionable & verifiable Basis for preservation and re-use of research Similar to “research objects”, “executable papers”, …

  14. Process Management Plans Need to establish Models for representing such process management plans (PMPs) Must be machine-readable and machine-actionable Identify “minimum set” of information Devise means to automate (most of) the activity in creating and maintaining those PMPs Establish them to replace (enhance / subsume / …) Data Management Plans

  15. Process Management Plans Structure of PMPs (following concept of DMPs): Overview and context Description of processes and their implementation Process description | Process implementation | Data used and produced by process Preservation Preservation history | Long term storage and funding Sharing and reuse Sharing | Reuse | Verification | Legal aspects Monitoring and external dependencies Adherence and Review

  16. Outline Why are Data Management Plans insufficient? From Data to Process Management Plans How to capture process & context? Summary

  17. Process Capture Need to establish what forms part of a process: analyzing process documentation establishing context of process, relationships between elements monitoring of process activities Capture and describe this in a context model

  18. Architectural Concepts • Based on Enterprise Architecture Framework(Zachmann), taxonomies (e.g. PREMIS), … • DIO: Domain-Independent Ontology • DSO: Domain-Specific Ontologies(legal, sensor, multimedia codecs, …)

  19. Process Capture Example: Music Classification Process • Input: music (e.g. MP3 format) • Input: trainingdata, i.e. musicwithgenrelabels • Output: classificationofmusic, e.g. intogenres • Intermediate steps • extractnumericdescription (features) frommusic • combinefeatureswithgroundtruthintospecificfileformat, …

  20. Process Capture Taverna …………….

  21. Process Capture Software setup can be automatically detected in OS with software packages (e.g. Linux); allows detection of licenses, dependencies

  22. Process Capture

  23. Process Capture • Example: • Music Classification Workflow

  24. Business Application Technology

  25. Process Re-deployment • Preservationand Re-deployment • „Encapsulate“ ascomplex „researchobjects“ (RO) • Re-Deploymentbeyond original environment • Format migrationofelementsof ROs • Cross-compilationofcode • Emulation-as-a-Service, virtualmachines, …

  26. Process Re-deployment • Verification, Validation & Data • Verifycorrectnessofre-execution • validationandverificationframework • processinstancedata • pointsofcapture • Metrics • Data anddatacitation • Identifyingsubsetsofdata in large anddynamicdatabases • Timestampingandversioningofdata • Assigning PID (DOI, …) to time-stampedquery

  27. Sustainable (e-)Science How to get there? Research infrastructure support Versioning systems Logging (“virtual lab-book”) Virtual machines / pre-configured virtual labs for research Data citation support for large, dynamic databases R&D in process preservation, re-deployment & verification Evolving research environments, code migration, … Verification of process re-execution Financial impact, business models

  28. Summary Need to move beyond concept of data Need to move beyond the focus on description Process Management Plans (PMPs) extending DMPs Process capture, preservation & verification Capture “all” elements of a research process Machine-readable and -actionable Data and process re-use as basis for data driven science

  29. Thank you! http://www.ifs.tuwien.ac.at/imp

More Related