1 / 11

Report for Wed.

Report for Wed. . Question 1 In your view/experience what parts of data life cycle, data citation and data integration implementations/applications or frameworks are well established (or not) in your disciplines and what are the common gaps?. Understanding Value Streams.

floria
Download Presentation

Report for Wed.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Report for Wed. Question 1 In your view/experience what parts of data life cycle, data citation and data integration implementations/applications or frameworks are well established (or not) in your disciplines and what are the common gaps?

  2. Understanding Value Streams • Single, continuous stream? • Collect, Process, Archive, Discover, Access, Use • Two distinct streams? • “Original Intent”: Collect, Process, Archive, Use • Secondary Use: Discover, Access, Use • Key differences: funding org, different community • BUT: can anything be done in the original intent stream to facilitate the secondary use?? • Spiral Model? • Collect, Process, Archive, Discover, Access, Use (=Process further) • Process further, Archive, Discover, Access, Use, …

  3. What Part of Framework is Working: Standards • WORKING • Some standards are useful and widely used • “Self-describing” formats: SEED, HDF, netCDF • Climate-Forecast (CF) convention

  4. Gaps in Standards • Some standards are underused • E.g., ISO metadata: cost, learning curve • Need to consider the human factor • Tool availability • Interdisciplinary standards are problematic • Discontinuities between disciplines in standards use • Observations and MeasurementsModel may help here for some disciplines • Need standards to support the scientific workflow • E.g., when to add metadata and what kind of metadata • Standards Churn (changing too fast/often)

  5. The Human Factor in Data Lifecycle Management • Incentives • Sticks: funding, publication requirements • Carrots: wider use of data, citations

  6. The problem with citations… • Human and Process problems • Citations are not being used where they should be • Digital data citations are not accepted in some citation indices • Data are not often peer-reviewed, therefore of uncertain quality and citability. • Technical problems • Agreement and widespread use of data identifiers • Citation granularity (dataset vs files vs columns in files)

  7. Metadata Capture • We need to capture more metadata at the point of data origin • Ideally, built into the collection mechanism • Also, following standards • Exemplars: • EXIF standard for cameras • ArcCatalog • SEED format from seismometers • EarthChem • We need to capture more metadata at later processing steps (beyond basic provenance) • Gap: handling provenance granularity

  8. Where/how to implement robust data management practices • Federal data centers • NOAA data centers, NASA DAACs • Federally Funded Research Centers • NCAR • University Consortia • IRIS DMC • Libraries (Could collaborate more with data centers) • Collaborations between scientists and data managers • Argonne “catalysts” example of helping scientists leverage computing facilities: apply to data mgmt • Professional Societies • Individual Universities • U. of Oklahoma Climate Services Center(?) Key Gap: Robust Business Model for Long-term Persistence of Data Archive

  9. Some Proposals to Involve More People in the Data Lifecycle… • Teach students about data management and require them to make data and metadata available as part of their thesis • Partnership with university libraries would be key • Involve 4-yr colleges more (not just graduate programs) • Provide a mechanism for people other than the data provider to add annotations to data • Provide more education on data management to practicing scientists

  10. Unresolved Questions • Model Output: treat like data or something else? • What to do about identifiers and locators for data? • Discussion assumed the web to be an integral part of the lifecycle. Is this Good or Bad, considering the overall low reliability of info on the web? • Establishing trust for data is clearly important

  11. Comments/Questions • Ted: • Need to stop talking about hard metadata is, or people will believe it • Hard to make generalized tools • Maybe make more domain specific tools? • Did you discuss metrics? • JG, maybe use SEI CMM model

More Related