300 likes | 413 Views
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information. Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely. Overview. HIVE—Helping Interdisciplinary Vocabulary Engineering Motivation—Dryad repository HIVE—Goals, status, and design A scenario
E N D
D4: SKOS and HIVE—Enhancing the Creation, Design and Flow of Information Speakers: Hollie White Jane Greenberg Coordinator: Alan Keely
Overview • HIVE—Helping Interdisciplinary Vocabulary Engineering • Motivation—Dryad repository • HIVE—Goals, status, and design • A scenario • HIVE for Law Library, repositories, etc. • Challenges • Technical and social • Conclusion and questions
HIVE model • <AMG> approach for integrating discipline CVs • Model addressing C V cost, interoperability, and usability • constraints (interdisciplinary environment) 15/09/2014 Titel (edit in slide master) 3
~ Surveyof400 evolutionary biologist: 48 % based on other data; 78% data not deposited ~ Evolutionary biologists use published data more frequently than they are depositing it themselves! Ecology Paleontology Physiology Systematics Genomics Population genetics…. 5
Partner Journals • American Society of Naturalists • American Naturalist • Ecological Society of America • Ecology, Ecological Letters, Ecological Monographs, etc. • European Society for Evolutionary Biology • Journal of Evolutionary Biology • Society for Integrative and Comparative Biology • Integrative and Comparative Biology • Society for Molecular Biology and Evolution • Molecular Biology and Evolution • Society for the Study of Evolution • Evolution • Society for Systematic Biology • Systematic Biology • Commercial journals • Molecular Ecology • Molecular Phylogenetics and Evolution
Vocabulary needs for Dryad • Vocabulary analysis • 600 keywords, Dryad partner journals • Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN, ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies) • Facets: taxon, geographic name, time period, topic, research method, genotype, phenotype… • Results 431 topical terms, exact matches • NBII Thesaurus, 25%; MeSH, 18% 531 terms (research method and taxon) • LCSH, 22% found exact matches, 25% partial • Conclusion: Need multiple vocabularies
HIVE...as a solution Address CV (controlled vocabulary) cost, interoperability, and usability constraints • COST: Expensive to create, maintain, and use • INTEROPERABILITY: Developed in silos (structurally and intellectually) • USABILITY: Interface design and functionality limitations have been well documented
Relevance to the law library community? • Orphaned data (more of a Dryad issue) • More important, interdisciplinary needs • COST (create, maintain, and use) • INTEROPERABILITY • USABILITY
HIVE Goals • Automatic metadata generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS) • Provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities • A model that can be replicated —> model and service Three phases of HIVE: 1. Building HIVE - Vocabulary Development - Server preparation - Primate Life Histories Working Group • Wood Anatomy and Wood Density Working Group • Sharing HIVE - Continuing education (empowering information professionals) • Evaluating HIVE - Examining HIVE in Dryad
HIVE Partners Vocabulary Partners • Library of Congress: LCSH • the Getty Research Institute (GRI): TGN (Thesaurus of Geographic Names ) • United States Geological Survey (USGS): NBII Thesaurus • Agrovoc Thesaurus Advisory Board • Jim Balhoff, NESCent • Libby Dechman, LCSH • Mike Frame, USGS • Alistair Miles, Ok • William Moen, University of North Texas • Eva Méndez Rodríguez, University Carlos III of Madrid • Joseph Shubitowski, Getty Research Institute • Ed Summers, LCSH • Barbara Tillett, Library of Congress • Kathy Wisser, Simmons • Lisa Zolly, USGS WORKSHOPS HOSTS: Columbia Univ.; Univ. of California, San Diego; Univ. of North Texas; Universidad Carlos III de Madrid, Madrid, Spain
HIVE Construction • HIVE stores millions of concepts from different vocabularies, and makes them available on the Web by a simple HTTP • Vocabularies are imported into HIVE using SKOS/RDF format • HIVE is divided in two different modules: • HIVE Core • SKOS/RDF storage and management (SESAME/Elmo) • SMART HIVE: Automatic Metadata Extraction and Topic Detection (KEA++ and MAUI) • Concept Retrieval (Lucene and MG4J) • HIVE Web • Web user Interface (GWT—Google Web Toolkit) • Machine oriented interface (SOAP and REST)
SKOS <rdf:RDF> <rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Wood-pulp"> <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/> <skos:prefLabel>Wood pulp</skos:prefLabel> <skos:altLabel>Pulp (wood)</skos:altLabel> <skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Wood”/> <skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Paper”/> <skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Paper-industry-wastes”/> <skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Pulp-mills”/> <skos:related rdf:resource="http://thesaurus.nbii.gov/nbii#Sawdust”/> <skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/> <skos:scopeNote>LSC Life Sciences</skos:scopeNote> </rdf:RDF>
Meet Amy • Amy Zanne is a botanist. • Like every good scientist, she publishes.
Meet Amy • Amy Zanne is a botanist. • Like every good scientist, she publishes. • She deposits data in Dryad.
Law library/data repositories • http://www.law.harvard.edu/library/research/databases/major.html • http://www.digitalcurrent.com/legal_webhosting.aspx
Challenges • Building vs. doing/analysis • Source for HIVE generation, beyond abstracts • Combining many vocabularies during the indexing/term matching phase is difficult, time consuming, inefficient. • NLP and machine learning offer promise • Interoperability = dumbing down • ontologies • Proof-of-concept/ illustrate the differences between HIVE and other vocabulary registries (NCBO and OBO Foundary) • General large team logistics, and having people from multiple disciplines (also the ++)
Conclusion • Vocabularies will enrich Dryad data description, and assist with access, use, reuse, etc… • Nothing novel, but infrastructure is supportive, finally… • Dryad and HIVE are real-world applications using Semantic Web technology Links • HIVE • http://ils.unc.edu/mrc/hive/ • Metadata Research Center <MRC> • http://www.ils.unc.edu/mrc/ • Dryad • http://datadryad.org/ • National Evolutionary Synthesis Center (NESCent) • http://www.nescent.org/index.php The Dryad Data Repository 15/09/2014 30