230 likes | 407 Views
Digital Curation or Digital Data? The impact of Services and Federation. Phil Lord Newcastle University. Take Home Messages. Curation is important for the CARMEN project and neuroinformatics To enable repeatability and rerunability, curation of both services and data are of equal importance
E N D
Digital Curation or Digital Data? The impact of Services andFederation Phil Lord Newcastle University
Take Home Messages • Curation is important for the CARMEN project and neuroinformatics • To enable repeatability and rerunability, curation of both services and data are of equal importance • To enable federation and autonomy, data release, license and other policies need to be operated over computationally.
Research Challenge Worldwide >100,000 neuroscientists(~ 5,000 in UK) are generating vast amounts of data Principal experimental data formats: molecular (genomic/proteomic) neurophysiological (time-series electrical measures of activity) anatomical (spatial) behavioural Neuroinformatics concerns how these data are handled and integrated, including the application of computational modelling Understanding the brain may be the greatest informatics challenge of the 21st century
Need for Cooperation Understanding the brain may be the greatest informatics challenge of the 21st century OECD Neuroinformatics Working Group identified the need to work cooperativelyin order to achieve major advances Cooperation will permit: development of common processes best value from data, including long term curation ‘mega-analysis’ of large data sets integration of data sets across different scales and different approaches interdisciplinary research
CARMEN – Focus on Neural Activity • raw voltage signal data collected by patch-clamp and single & multi- electrode array recording • novel optical recording, particularly the activity dynamics of large networks Understanding the brain may be the greatest informatics challenge of the 21st century resolving the ‘neural code’ from the timing of action potential activity neurone 1 neurone 2 neurone 3
CARMEN is a new e-Science Pilot Project, (UK research council funded) in Neuroinformatics. • To create a grid-enabled, real time ‘virtual laboratory’ environment for neurophysiological data • To develop an extensible ‘toolkit’ for data extraction, analysis and modelling • To provide a repository for archiving, sharing, integration and discovery of data • To achieve wide community and commercial engagement in developing and using CARMEN • CARMEN is a 4 year project: if it is to last longer, it must become financially self-sufficient. • See http://www.carmen.org.uk
Service Repository 2 : service fetch & deploy SR node 1 s 2 , s 5 req node 2 1 3 C WSP res … Web Server node n s 2 Compute Machines Dynamic Service Deployment - Dynasoar R Client CAIRN
Distribution and Federation Initially, we plan to have two CAIRNS
What about digital curation? Courtesy of Wikipedia
CARMEN’s perspective • We wish to store data, store it’s provenance, store it’s usage. • We need release policies, we need retention policies, we need to understand ownership
Replicability Rerunability Old Data New Data What do we get from this? • Replicability: one scientist should be able to repeat another’s experiment, under equivalent conditions, at a different time. • Rerunability: a scientist should be able to apply an equivalent technique under new circumstances. • The addition of services into this mix complicate the issue.
Has the state of the world advanced since previously? Has the world changed, in a comparable way? Has the service changed in a comparable way? Is the specification of what happened actually right? Eager Neuroscientist Rerunability Neurosciensist comparing to existing work Tool Builder New Data New Services Replicability Error-Prone Neuroscientist Old Services Old Data
So, what is problem? • I would like to rerun this experiment and release the results. Can I? • Is the new data available? • Is the new data public? • Does the license allow derived results? • Who owns the derived results? • data license • software license
So, whats the problem? • Can I compare how new data would have changed the results? • Is that data available? (New and Old) • Is that data public? (New and Old) etc… • Is it embargoed – will it become public later? • Do the licenses allow derived results? • Who owns the derived results? • The licenses may conflict
Policy Issues • One of the main purposes of the CAIRN is to hide the distribution. • What if the CAIRNs have different release policies? What if they have different licenses? • We cannot inflict these differences on the user. • Therefore, we must be able to compute over policies • We must be able to represent justifications back to the users
An Example: Licensing • Computationally amenable licenses are available • Take, for example, Creative Commons
Take Home Messages • Curation is important for the CARMEN project and neuroinformatics • To enable repeatability and rerunability, curation of services and data are of equal importance • To enable federation and autonomy, data release, license and other policies need to be operated over computationally.
TheUniversity OfSheffield Acknowledgements Professor Colin Ingram, Professor Jim Austin, Professor Leslie Smith,Professor Paul WatsonDr. Stuart Baker,Professor Roman Borisyuk, Dr. Stephen Eglen, Professor Jianfeng Feng, Dr. Kevin Gurney, Dr. Tom JacksonDr. Marcus Kaiser, Dr. Phillip Lord, Dr. Paul Overton, Dr. Stefano Panzeri, Dr. Rodrigio Quian Quiroga, Dr. Simon Schultz, Dr. Evelyne Sernagor, Dr. V. Anne Smith, Dr. Tom Smulders Professor Miles Whittington, Christoph Echtermeyer, Martyn Fletcher, Frank Gibson, Mark Jessop Dr. Bojian Liang, Juan Martinez-Gomez, Dr. Chris Mountford, Agah Ogungboye, Georgios Pitsilis, Dr. Daniel Swan University ofSt Andrews