110 likes | 238 Views
Report for Wed. . Question 1 In your view/experience what parts of data life cycle, data citation and data integration implementations/applications or frameworks are well established (or not) in your disciplines and what are the common gaps?. Understanding Value Streams.
E N D
Report for Wed. Question 1 In your view/experience what parts of data life cycle, data citation and data integration implementations/applications or frameworks are well established (or not) in your disciplines and what are the common gaps?
Understanding Value Streams • Single, continuous stream? • Collect, Process, Archive, Discover, Access, Use • Two distinct streams? • “Original Intent”: Collect, Process, Archive, Use • Secondary Use: Discover, Access, Use • Key differences: funding org, different community • BUT: can anything be done in the original intent stream to facilitate the secondary use?? • Spiral Model? • Collect, Process, Archive, Discover, Access, Use (=Process further) • Process further, Archive, Discover, Access, Use, …
What Part of Framework is Working: Standards • WORKING • Some standards are useful and widely used • “Self-describing” formats: SEED, HDF, netCDF • Climate-Forecast (CF) convention
Gaps in Standards • Some standards are underused • E.g., ISO metadata: cost, learning curve • Need to consider the human factor • Tool availability • Interdisciplinary standards are problematic • Discontinuities between disciplines in standards use • Observations and MeasurementsModel may help here for some disciplines • Need standards to support the scientific workflow • E.g., when to add metadata and what kind of metadata • Standards Churn (changing too fast/often)
The Human Factor in Data Lifecycle Management • Incentives • Sticks: funding, publication requirements • Carrots: wider use of data, citations
The problem with citations… • Human and Process problems • Citations are not being used where they should be • Digital data citations are not accepted in some citation indices • Data are not often peer-reviewed, therefore of uncertain quality and citability. • Technical problems • Agreement and widespread use of data identifiers • Citation granularity (dataset vs files vs columns in files)
Metadata Capture • We need to capture more metadata at the point of data origin • Ideally, built into the collection mechanism • Also, following standards • Exemplars: • EXIF standard for cameras • ArcCatalog • SEED format from seismometers • EarthChem • We need to capture more metadata at later processing steps (beyond basic provenance) • Gap: handling provenance granularity
Where/how to implement robust data management practices • Federal data centers • NOAA data centers, NASA DAACs • Federally Funded Research Centers • NCAR • University Consortia • IRIS DMC • Libraries (Could collaborate more with data centers) • Collaborations between scientists and data managers • Argonne “catalysts” example of helping scientists leverage computing facilities: apply to data mgmt • Professional Societies • Individual Universities • U. of Oklahoma Climate Services Center(?) Key Gap: Robust Business Model for Long-term Persistence of Data Archive
Some Proposals to Involve More People in the Data Lifecycle… • Teach students about data management and require them to make data and metadata available as part of their thesis • Partnership with university libraries would be key • Involve 4-yr colleges more (not just graduate programs) • Provide a mechanism for people other than the data provider to add annotations to data • Provide more education on data management to practicing scientists
Unresolved Questions • Model Output: treat like data or something else? • What to do about identifiers and locators for data? • Discussion assumed the web to be an integral part of the lifecycle. Is this Good or Bad, considering the overall low reliability of info on the web? • Establishing trust for data is clearly important
Comments/Questions • Ted: • Need to stop talking about hard metadata is, or people will believe it • Hard to make generalized tools • Maybe make more domain specific tools? • Did you discuss metrics? • JG, maybe use SEI CMM model