110 likes | 299 Views
FISHNet Project. Dr Mike Haft, FBA Dr Mark Hedges, KCL. FISHNet Requirements Gathering. Interviewed freshwater scientists from: Sheffield University King’s College London Queen Mary University London The Environment Agency Centre for Ecology and Hydrology Birmingham University
E N D
FISHNet Project Dr Mike Haft, FBA Dr Mark Hedges, KCL
FISHNet Requirements Gathering • Interviewed freshwater scientists from: • Sheffield University • King’s College London • Queen Mary University London • The Environment Agency • Centre for Ecology and Hydrology • Birmingham University • Interviews synthesised into Use Case report • Acts as wish list for features of such a system and informs direction of future development.
Specific Challenges • We don’t want to share our data as we already have a data management system • We’re interested in sharing but only if we have complete control of how it is used by others. • We don’t know who owns the IPR on our data • Ownership of some datasets is shared between two or more institutions • If I share my data, others may “scoop” my research, or reuse data without crediting me
Specific Challenges 2 • I’m not interested in sharing data, it’s mine. • I’m interested in sharing data so long as I have to expend no effort whatsoever • My data is in notebooks and not digital • My data is digital but held on 3.5” disk somewhere in a drawer. • My data can be shared, but you’ll have to make sense of it without my help
Approach taken by FISHNet • FISHNet has no sticks to compel researchers to add data, only carrots. • If we cannot force researchers to add data how do we make it worth their while? • DOIs for data sets. • Data curation/preservation services. • Low barriers to deposit. • Also, ease of access to related information (e.g. bibliographic info, images etc)
FISHNet Information Object Types • FBA Library Catalogue Items (Journal Articles, Monographs and Book Sections) • E-Prints (still being finalised) • Grey literature • Images (including digitized Fritsch Sheets) • Datasets
Traffic Light System for Data • Simple categorisation • Red = Catalogue of metadata entries for datasets, • Orange = Data reviewed for reusability and assigned a DOI • Green = Published and freely downloadable by anybody (subject to appropriate license)
Red Category Data • Basic metadata about a dataset (29 fields) • Including: IPR owner, contact person • Depositing dataset itself is optional: institutions not wishing to deposit copies of their data in an external repository can simply list their holdings (i.e. CEH, EA) • If dataset deposited, only minimal QA on data, only bitstream preservation • Users can request the data by contacting the relevant person
Orange Category Data • Repository staff ensure that the dataset meets appropriate reusability criteria and conforms to appropriate standards • If criteria/standards are met, a DOI is assigned to the dataset via BL/DataCite. • Potential for significantly increasing number of citations received for a particular piece of work. • Addresses issues of academic credit for data • Access is still determined by the owner, thus sidestepping issues of IPR & licensing
Green Category Data • Data is published under an appropriate Creative Commons license and made freely available for download under those license terms. • Datasets may also be fully indexed for searching (not just metadata). • Data may be transformed into linked data (addressed by the sibling FISH.Link project): • Data is converted to RDF and added to a triple-store, thus making it possible to query via SPARQL • Data is published as linked data using a URI schema and database
Issues (to be addressed) • Compound data objects: • To date, our datasets have been single files • How should we represent compound/aggregated datasets that consist of multiple related files, e.g. by using OAI-ORE • Dynamic/changing datasets: • User takes an existing dataset and modifies it by adding new data, annotating it etc. Who does the dataset then belong to and how do you track that? • Datasets that are regularly updated/added to (e.g. Feed from sensor)