100 likes | 194 Views
Citations Top to Bottom http :// etherpad.ooici.org /geodata-fir3. The Fir Group – Breakout 3
E N D
Citations Top to Bottom http://etherpad.ooici.org/geodata-fir3 The Fir Group – Breakout 3 Kerstin Lehnert, John Graybeal, Dmitri Mozzherin, Vivian Hutchison, GiriPalanisamy, Eric Wolf, Ron Weaver, Jan Peters, Walt Snyder, Mary Marlino, Cheryl Morris, Benjamin D Branch, Steve Tessler, Lisa Raymond, Jeanine Aquino, Scott Jensen, Percy Donaghay, Dave Folker, Sze-Ling Celine Chan, Doug Walker
Why we cite - Reasons for creating a citation for a dataset or data • Give credit to creator (Credit) • Allow humans to know about the data and machines to find the source (Use) • Know the provenance of the data (History) • Give rigor and reproducibility to analysis (Rigor) • Allow specificity and exactness (possibly down to single item)
Why we Cite Caveat - Citation and metadata records come from the data source (History) • Must come from the data source • Citation – source can give the most detailed and appropriate description including the persistence of the data • Metadata – source understands and can describe the data well at any granularity. Source also can record what the user did to discover/download the data.
Why we cite – Rigor/Reproduce (Rigor) • Scientific method requires that we can replicate results and reproduce experiment to get the same data and/or result • Can the data source reliably reproduce and/or recover the same result based on the same search/request?
Data sources are really variable! (Credit, history, rigor) • Persistence is a defining factor – Persistence means that the data, or some version of them, can be found in perpetuity (?) • 1. Persistent and static or tightly versioned – same query or request produces exactly the same result • 2. Persistent but variable – changes and versions are not tracked, but basic dataset/data type is available. Same query produces similar results, but possibly with differences • 3. Not persistent/streaming – data and data sources come and go and are valuable while there. • THESE ALL PRODUCE IMPORTANT RESULTS!
Persistent and static or infrequently versioned data (rigor) • Citation is easy and rigorous (although we still have to define it) • Metadata stable • User gets the same result • Source maintains the whole record
How about the other 99% of data sources? (rigor?!) • What is appropriate for these data sources? Community recognizes that this is an appropriate scientific activity that yields reliable and important results. • Move toward persistent and stable • Create a SNAPSHOT
What is a SNAPSHOT • It is what was downloaded • The User of the data is the instigator • The Source(s) provide citation and metadata • It is not appropriate for persistent and static sources • It provides the rigor for analysis but not extraction • It must be made immutable because the source is not • It must be persistent somewhere (library, source, other)
How to cite something - USE • Human interaction • Assess source for quality and create trust • Know the author, source, time, version - someone will figure out how to format/specify, or the source will give the information • Machine • Where is the source and is it a snapshot? • Resolve to something humans can use (mostly)
Use, history, and rigor seem to be OK, what about Credit ? • Highest level seems to be tractable • Should be given to original sources, contributors, compilers, collectors • I did the work, give me some credit. • HOW? • In a meaningful way (ISI)