380 likes | 389 Views
This article explores the role of libraries in the data decade and provides seven steps for libraries to take in order to effectively deal with the challenges and opportunities presented by data management.
E N D
Acting as Advocate? Seven steps for libraries in the data decade Dr Liz Lyon, Director, UKOLN, University of Bath, UK Associate Director, UK Digital Curation Centre IATUL Conference, Purdue University, June 2010 . UKOLN is supported by: This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#november-2009http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#november-2009 • Open Science at Web-Scale • Scale, Complexity, Predictive Potential • Continuum of Openness • Citizen Science • Credentials, Incentives, Rewards • Institutional Readiness & Response • Data Informatics Capacity & Capability • Consultation: • Write-To-Reply • Keynote Presentations: • eResearch Australasia Nov 2009 • CNI, Baltimore April 2010 • http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html
data scale Human Genome printed http://www.flickr.com/photos/johnjobby/2252981353/sizes/l/ Human Genome printed http://www.flickr.com/photos/johnjobby/2252981353/sizes/l/
...data logistic challenges.... • Large-scale data storage that is: • Cost-effective (rent on-demand) • Secure (privacy and IPR) • Robust and resilient • Low entry barrier / ease-of-use • Has data-handling / transfer / analysis capability • Move sequencing out of genome centres • “....analyse an entire human genome in a single day sitting with a laptop at your local Starbucks.” ...cloud services
Library Actions • Provide Briefings on Cloud Data Services (in partnership with local IT Services?)
Workflows, Models, Tools Sage Bionetworks genomics Workflow
An Idealised Scientific Research Data Lifecycle Model Papers, articles, presentations, reports Scholarly Knowledge Research Concept and/or Experiment Design Publish Research Write Proposal Publication Database Research Outputs Validate, Reuse & Repurpose Data (include DMP) Peer Review Research Discover & Access Peer-review Proposal Prepare Manuscript Comments, annotations, ratings etc. IPR, Embargo & Access Control Reference Linking Start Project Prepare Supplementary Data Archive, Preservation & Curation User registration data; Instrument allocation data etc. Documentation, Metadata & Storage (Reference, Provenance, Context, Calibration etc.) Write Usage Reports Acquire Sample Raw, Correction & Calibration Data Results Data Derived Data Processed Data Risk assessment data; other sample data Interpret & Analyse Results Data Conduct Experiment Generate, Create, & Collect Raw Data Analyse Derived Data Process Raw Data into Derived Data Appraisal & Quality Control Programs (generate customised software) KEY Research Activity Research Admin Activity Archive Activity Information Flow Publication Activity
State-of-the-Art Report : Models & Tools (Alex Ball, June 2010) Data Lifecycles Data Policies (UK) incl DMP Standards & tools Data Asset Framework (DAF) DANS Seal of Approval Preservation metadata Archive management tools Cost / benefit tools
Library Actions • Provide Briefings on Cloud Data Services (in partnership with local IT Services?) • Build usable Data Management Tools working in partnership with researchers
Keeping Research Data Safe2 Report: April 2010 Benefits Taxonomy: Summary
Library Actions • Provide Briefings on Cloud Data Services (in partnership with local IT Services?) • Build usable Data Management Tools working in partnership with researchers • Develop Data Sustainability Strategies and articulate the cost-benefits
Ethics, Privacy, Culture “You have zero privacy anyway. Get over it” Scott McNealy, CEO Sun Microsystems, 1999
Post-genome decade Human genomes: >24 published & almost 200 unpublished
“P4 medicine : Predictive, Personalised, Preventive, Participatory.”Leroy Hood – Institute for Systems Biology ...“medicine is going to become an information science”... Image from Scientific American
P4 medicine • Each patient’s genome sequenced • Your genome is basis of your medical record • New method to anonymise medical records for genomics research at Vanderbilt Univ (April ‘10) • New predictive models of health and disease • Personalised treatments focus on preventative therapies Genome scale network biology Genomic data as a commodity
“While many researchers are positive about sharing data in principle, they are almost universally reluctant in practice. ..... using these data to publish results before anyone else is the primary way of gaining prestige in nearly all disciplines.” INCREMENTAL Project
Sage Bionetworks : Integrative genomics Open data in the Sage Commons repository Human and mouse: clinical and genetics data Develop predictive models of disease: liver / breast / colon cancer, diabetes, obesity Crowd-sourced effort : global scope Stephen Friend
Participatory medicine : share data & empower the patient... Sage Congress San Francisco April 2010
Library Actions • Provide Briefings on Cloud Data Services (in partnership with local IT Services?) • Build usable Data Management Tools working in partnership with researchers • Develop Data Sustainability Strategies and articulate the cost-benefits • Publish Case Studies on Open Science to show benefits of universal data sharing
Library Actions • Provide Briefings on Cloud Data Services (in partnership with local IT Services?) • Build usable Data Management Tools working in partnership with researchers • Develop Data Sustainability Strategies and articulate the cost-benefits • Publish Case Studies on Open Science to show benefits of universal data sharing • Present at University Ethics Committee to highlight open data issues for faculty
Professional Scientists Enthusiastic amateurs Training Citizen scientist Standards and ethics Local : natural history, environ. Peer-review Global : astronomy Organisational support Self-supporting
Citizen Science : validated in the professional press
Library Actions • Raise awareness of Citizen Science opportunities & guidelines for good practice
Data Publication and Attribution http://www.flickr.com/photos/digitalfemme57/3271063366/
Journal Article Workflow Visualisation Model Data Annotation Concept What are we citing? Macro Micro / Nano Attribution granularity
How to cite large-scale predictive network models? • Multiple data sources • Linked data approach • Visualise : Cytoscape • Workflow : Taverna • Provenance issues
Library Actions • Raise awareness of Citizen Science opportunities & guidelines for good practice • Promote Data Citation and Attribution to embed in publication practice and influence funder policy
Take homes... • Briefings on Cloud Data Services • Build usable Data Management Tools • Develop Data Sustainability Strategies • Publish Case Studies on Open Science • Present at University Ethics Committee • Raise awareness of Citizen Science • Promote Data Citation and Attribution ...Acting as Advocate
Thank you… Chicago Mart Plaza, 6-8 December 2010