300 likes | 468 Views
(Linked) Data Curation challenges. Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk Kevin.ashley@ed.ac.uk. Reusable with attribution: CC-BY. The DCC is supported by Jisc. Acknowledgements. John Wilkins & Cameron Neylon Ideas, images, slides, inspiration.
E N D
(Linked) Data Curation challenges Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk Kevin.ashley@ed.ac.uk Reusable with attribution: CC-BY The DCC is supported by Jisc
Acknowledgements • John Wilkins & Cameron Neylon • Ideas, images, slides, inspiration Kevin Ashley – CC-BY
Data views and processes • Administration • Discovery • Work-level description • Discipline-level interpretation Kevin Ashley – CC-BY
Administrative view Data produced by the department of linguistics Data from projects funded by NERC Kevin Ashley – CC-BY
Discovery view Data about reproductive behaviour in freshwater fish Kevin Ashley – CC-BY
Work-level description Kevin Ashley – CC-BY
Data is variable • Not always textual • Not always tabular • Not always fixed • Not always clearly authored – think of archival provenance • Not always associated with publication Kevin Ashley – CC-BY
95% of research results are never published Kevin Ashley – CC-BY http://www.flickr.com/photos/sethw/113073189/
If a million postdocs repeat a million experiments… Kevin Ashley – CC-BY http://flickr.com/photos/heymans/480396810/
And 25% of those don’t work… Kevin Ashley – CC-BY http://flickr.com/photos/cliche/120070310/
…how much taxpayer’s money is that? Kevin Ashley – CC-BY http://flickr.com/photos/luismimunoznajar/2093185804/
I need that data now!!! I don’t care how messy it is – I can fix it! I’ve wasted too much of my life fixing other’s people’s bad data. I’m not interested until you’ve cleaned it up and documented it. Besides, I have other things to think about Kevin Ashley – CC-BY
Grandfather’s axe When is my dataset a new dataset? coconinoco@flickr.com CC-BY-NC-SA Kevin Ashley – CC-BY
Authorship • Reference data – cell-level provenance versus single author data table • ‘Cleaned’ data – can pass through many hands • Synthesis… Kevin Ashley – CC-BY
Potential wins • Provenance of machine-gathered data – linking observations to instrument descriptions • Linking data in multiple places • Data and publications and plans • Robust assertions about data versioning • Association of data with institutions Kevin Ashley – CC-BY
networks of people… Kevin Ashley – CC-BY
More wins • Assertions at table and variable group level • Linking that crosses disciplinary boundaries: • Biochemistry and neuroscience • Naval history, economics and climate science • Linking that crosses research and administrative boundaries Kevin Ashley – CC-BY
IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases After John WIlbanks Kevin Ashley – CC-BY
Tylenol N-acetyl-p-aminophenol Acetaminophen SameAs Paracetamol N-(4-hydroxyphenyl)ethanamide N-(4-hydroxyphenyl)acetamide Kevin Ashley – CC-BY
“I never had an idea that couldn’t be improved by sharing it with as many people as possible…” BillHooker (2006)http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html Kevin Ashley – CC-BY
Challenge? Opportunity • Linked data can improve administration of research and research data • The real potential is in improving research quality and efficiency • The same actors can’t do both • The actions don’t need to be in lock-step Kevin Ashley – CC-BY