1 / 33

The Dryad Data Repository: Metadata Workflows and Processes

Learn about Dryad, a curated general-purpose repository enabling data discoverability, reusability, and citability through metadata workflows and processes. Explore examples, costs, governance, sustainability, and more.

Download Presentation

The Dryad Data Repository: Metadata Workflows and Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Dryad Data Repository: Metadata Workflows and Processes 2nd Data Management Workshop November 28th – 29th 2014 University of Cologne, Germany Jane Greenberg Professor, College of Computing & Informatics (CCI) Director, Metadata Research Center <MRC> Erin Clary, Dryad Curator, CCI/MRC

  2. 1. Dryad 2. Workflow/examples 3. Data about Dryad 4. Metadata R&D 5. Concluding remarks

  3. Dryad…a curated general-purpose repository…makes data discoverable, freely reusable, and citable. “…enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.” (http://datadryad.org/) Not this 

  4. http://datadryad.org/

  5. JDAP Author Curator

  6. Pre-populated metadata field

  7. Elsevier’s Science Direct: EXAMPLE: Dryad Unmack,  et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019.

  8. Elsevier’s Science Direct: EXAMPLE: Dryad Unmack,  et al, Phylogeny and biogeography…Molecular Phylogenetics and Evolution http://dx.doi.org/10.1016/j.ympev.2012.12.019

  9. Data downloads  reuse  citation Download 10678 times Observations, motivating study of metadata capital Metadata generation costs money Metadata reuse is a BIG part of Dryad’s workflow Metadata reuse via OAI Metadata reuse via data sharing, reuse, and repurposing

  10. Greenberg J, Swauger S, Feinstein EM (2013) Data from: Metadata capital in a data repository. Proceedings of the International Conference on Dublin Core and Metadata Applications http://dx.doi.org/10.5061/dryad.8c1p6

  11. workflows statistics • Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals • X >10GB = $15,$10+

  12. http://wiki.datadryad.org/Sample_Dryad_Content#Examples_by_file_typehttp://wiki.datadryad.org/Sample_Dryad_Content#Examples_by_file_type

  13. Interoperability Technology DSpace DOIs via CDL/DataCite CC0 (<m>+ data) Integration with specialized repositories and databases • Federated searching with TreeBASE and KNB LTER • TreeBASE submission (OAI-PMH) • GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors” • Sets policy, goals • science, journals, societies, OCLC, MS • 2006 Dryad development – NESCent +<MRC> • Stakeholders: journals, publishers and scientific societies, and researchers. • 2009-2012: Interim Board $ PAYMENT-Sept. 1,2014

  14. Sustainability: Plan Comparison

  15. More on grown and sustainability Membership: http://datadryad.org/pages/membershipOverview Pricing and sponsorship of deposits: http://datadryad.org/pages/pricing Journal integration:  http://datadryad.org/pages/journalIntegration

  16. …about interoperability The metadata hook

  17. Metadata research & development • Curation workflow - cognitive walkthroughs • Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) • Metadata reuse - content analysis(Greenberg, IDCC Research Summit, 2010) • Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) • Name-authority control - exploratory study (Haven, 2009, INLS 720) • KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) • Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) • Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) • Metadata theory – deductive analysis (Greenberg, 2009)

  18. Dryad DCAP, ver. 3.0 • bibo (The Bibliographic Ontology) • dcterms (Dublin Core terms) • dryad (Dryad) • DwC (Darwin Core) • Vision • Simple: automatic metadata gen; heterogeneous datasets *Data-package centric • Interoperable: harvesting, cross-system searching • Semantic Web compatible: sustainable; supporting machine processing Singapore Framework Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.

  19. Helping Interdisciplinary Vocabulary Engineering (HIVE) 21

  20. ~~~~Amy DATA publication

  21. Package metadata harvested from email Contr. 101 (gr. 99%, bl. 1%) Subj. 177 (gr. 97%, rd. 2%, bl. 1%)

  22. Modified Capital-sigma notation Cost / value Reuse 

  23. Author/Submitter | Curator • 100 metadata instantiations • 8 of 12 metadata properties had reuse @ 50% or greater • 5 of 8 confirmed reuse at • 80% or higher. • Basic bib. vs. complex

  24. Author Dcterms.spatial Subject  DwC.ScientificName

  25. Conclusion…other Valuation Approaches • Market cap of Facebook per user: $40 – $300 • Revenues per record per user: $4 – $7 per year • Facebook • Experian • Market prices of personal data: • $0.50 for street address • $2.00 for date of birth • $8 for social security number • $3 for driver’s license number • $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

  26. Concluding comments • Success story • Contribution, have to start somewhere… • Good timing, the right discipline • Confirmed use, reuse • Machine capabilities • An educative commons, intellectually engaging

  27. http://wiki.datadryad.org/Sample_Dryad_Content

  28. Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI) **Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton

  29. http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata ReserchCenter: http://cci.drexel.edu/mrc

More Related