1 / 37

Metadata Quality and Capital

Learn about the efficient and effective ways to generate metadata, the importance of metadata ownership, and the benefits of metadata reuse in the Dryad repository. Explore the ROI and RDA frameworks, as well as the integration and interoperability features of Dryad.

lrushin
Download Presentation

Metadata Quality and Capital

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Quality and Capital Disseminators and Service Providers November 20, 2014 Jane Greenberg Professor, College of Computing & Informatics Director, Metadata Research Center

  2. Your data is only as good as your metadata Metadata is a first class object

  3. cyberinfrastructure motivation What is he most efficient and effective way to generate metadata? Metadata ownership…. Toothbrush W, W, W, W, H?

  4. The topic… Good enough is not bad (DRYAD) ROI – return on investment (CAPITAL) RDA – Research Data Alliance (COMMUNITY)…. time permitting

  5. Dryad…a curated general-purpose repository…makes data discoverable, freely reusable, and citable. “…enables scientists to validate published findings, explore new analysis methodologies, repurpose data for research questions unanticipated by the original authors, and perform synthetic studies.” (http://datadryad.org/) Not this 

  6. JDAP Author Curator

  7. Pre-populated metadata field

  8. Data downloads  reuse  citation Download 10678 times Observations, motivating study of metadata capital Metadata generation costs money Metadata reuse is a BIG part of Dryad’s workflow Metadata reuse via OAI Metadata reuse via data sharing, reuse, and repurposing

  9. workflows statistics • Journals (80+…PLOS): http://datadryad.org/pages/integratedJournals • X >10GB = $15,$10+

  10. Interoperability Technology DSpace DOIs via CDL/DataCite CC0 (<m>+ data) Integration with specialized repositories and databases • Federated searching with TreeBASE and KNB LTER • TreeBASE submission (OAI-PMH) • GenBank (currently in development) Governance “non-profit status, 12 member Board of Directors” • Sets policy, goals • science, journals, societies, OCLC, MS • 2006 Dryad development – NESCent +<MRC> • Stakeholders: journals, publishers and scientific societies, and researchers. • 2009-2012: Interim Board $ PAYMENT-Sept. 1,2014

  11. …about interoperability The metadata hook

  12. Dryad DCAP, ver. 3.0 • bibo (The Bibliographic Ontology) • dcterms (Dublin Core terms) • dryad (Dryad) • DwC (Darwin Core) • Vision • Simple: automatic metadata gen; heterogeneous datasets *Data-package centric • Interoperable: harvesting, cross-system searching • Semantic Web compatible: sustainable; supporting machine processing Singapore Framework Greenberg, et al, 2009, Metadata Best Practice for a Scientific Data Repository, JLM, DOI:10.1080/19386380903405090.

  13. Metadata research & development • Curation workflow - cognitive walkthroughs • Dryad metadata scheme development - crosswalk analyses (Dube, et al, 2007; Carrier, et al, 2007; White et al., 2008, Greenberg, et al, 2010; Greenberg 2009; 2010) • Metadata reuse - content analysis(Greenberg, IDCC Research Summit, 2010) • Instantiation - multi-method study (comprehensions assessment) (Greenberg, RDAP, 2010, UNAM 2012) • Name-authority control - exploratory study (Haven, 2009, INLS 720) • KO/metadata community practices - Concurrent triangulation mixed methods (survey + simulation experiment) (White, 2010, ASIST, 2010 JLM) • Metadata functions - quantitative categorical analysis (Willis, Greenberg, and White, 2010, CODATA, 2012, JASIST) • Vocabulary needs (HIVE) – mapping study (Greenberg, 2009, CCQ; Scherle, 2010, Code4Lib) • Metadata theory – deductive analysis (Greenberg, 2009)

  14. Interoperability slope Semantic ontologies Researcher names Agency/ institution

  15. Package metadata harvested from email Contr. 101 (gr. 99%, bl. 1%) Subj. 177 (gr. 97%, rd. 2%, bl. 1%)

  16. The leap - capital to metadata capital • An economic concept (Weber, 1905; Smith’s, 1776) • Business and operations (net gains or losses) • Finances, goods and services, and public needs • Intellectual capital, social capital • a tangible result, value increase • Metadata as an asset, a product • Reuse of good quality metadata increase value of initial investment • Poor quality may reduce metadata capital ? • Metadata reuse prevalence • Cooperative cataloging , CIP, ISBD, MARC, FRBR, LCC, VIAF, OAI-PMH, CrossRef, PubMed, Zotero, BibTex, DataCite. Linked data/Semantic Web, PIDs, etc.

  17. Modified Capital-sigma notation Cost / value Reuse 

  18. Author/Submitter | Curator • 100 metadata instantiations • 8 of 12 metadata properties had reuse @ 50% or greater • 5 of 8 confirmed reuse at • 80% or higher. • Basic bib. vs. complex

  19. Author Dcterms.spatial Subject  DwC.ScientificName

  20. Modified Capital-sigma notation for linked data P = Determined by the number of terms in an ontology, labor hours to generate, integrate, etc, Cost / value Reuse of linked data concept/URI

  21. Helping Interdisciplinary Vocabulary Engineering (HIVE) • C V cost, interoperability, and usability constraints • Linked Open Vocabulary initiative, to support inter/transdisciplinary…. • SKOS (a little dumb) • AMG + machine learning approach for integrating discipline terminologies 25

  22. ~~~~Amy Amy’s data Meet Amy Zanne. She is a botanist. Like every good scientist, she publishes, and she deposits data in Dryad.

  23. What about successive growth rate tied to a concept? A concept can be • in ~ vernacular to canonical • fall by the wayside, less popular • out (deprecated)

  24. Conclusion…other Valuation Approaches • Market cap of Facebook per user: $40 – $300 • Revenues per record per user: $4 – $7 per year • Facebook • Experian • Market prices of personal data: • $0.50 for street address • $2.00 for date of birth • $8 for social security number • $3 for driver’s license number • $35 for military record SOURCE: OECD. Exploring the Economics of Personal Data: A Survey of Methodologies for Measuring Monetary Value. OECD Digital Economy Papers. Office for Economic Cooperation and Development Publishing, 2013.

  25. Concluding remarks Interest….traction Limitations: bad data, cost/value We should care about cost Metadata capital can contextualize Generic formula for further research

  26. Metadata Standards Directory Working Group…. Jane Greenberg, Alex Ball, Keith Jeffery, Rebecca Koskela

  27. Goals and workplan - DCC Disciplinary Directory: http://www.dcc.ac.uk/resources/metadata-standards “…develop a collaborative, open directory of metadata standards applicable to scientific data” Stakeholders: Researchers, data managers, data scientists, tool developers, repositories, agencies, societies (RDA’s growing community)

  28. Acknowledgments Dryad Consortium Board, journal partners, and data authors NESCent: Laura Wendell (Executive Director), Hilmar Lapp, Heather Piwowar, Peggy Schaeffer, Ryan Scherle, Todd Vision (PI) **Drexel/UNC <Metadata Research Center>: Jose R. Pérez-Agüera, Sarah Carrier, Elena Feinstein, Lina Huang, Robert Losee, Hollie White, Craig Willis, Jane Smith, Shea Swuager, Liz Turner, Christine Mayo, Adrian Ogletree, Erin Clary U British Columbia: Michael Whitlock NCSU Digital Libraries: Kristin Antelman HIVE: Library of Congress, USGS, and The Getty Research Institute; and workshop hosts Yale/TreeBASE: Youjun Guo, Bill Piel DataONE: Rebecca Koskela, Bill Michener, Dave Veiglais, and many others British Library: Lee-Ann Coleman, Adam Farquhar, Brian Hole Oxford University: David Shotton

  29. http://datadryad.org http://blog.datadryad.org http://datadryad.org/wiki http://code.google.com/p/dryad dryad-users@nescent.org Facebook: Dryad Twitter: @datadryad http://ils.unc.edu/mrc/hive/ http://code.google.com/p/hive-mrc/ Metsdata ReserchCenter: http://cci.drexel.edu/mrc

  30. Sustainability: Plan Comparison

  31. More on grown and sustainability Membership: http://datadryad.org/pages/membershipOverview Pricing and sponsorship of deposits: http://datadryad.org/pages/pricing Journal integration:  http://datadryad.org/pages/journalIntegration

More Related