340 likes | 497 Views
Creating Research Objects that contain collections of data, papers and research workflows. David De Roure. “Web as carrier pigeon”. BioEssays ,, 26 (1):99–105, January 2004. http:// research.microsoft.com /en-us/collaboration/ fourthparadigm /. Big Data Big Compute. The Future!.
E N D
Creating Research Objectsthat contain collections of data, papers and research workflows David De Roure
BioEssays,, 26(1):99–105, January 2004 http://research.microsoft.com/en-us/collaboration/fourthparadigm/
Big Data Big Compute The Future! Compute & Data Complexity Socialnetworking Conventionalcomputation Social Complexity
Outline • The myExperiment experiment • Workflow Forever • Science fiction about science facts
E. Science laboris • Data Analysis Pipelines • Workflows are the new rock and roll • Machinery for coordinating the execution of services and linking together resources • Repetitive and mundane boring stuff made easier
Reuse, Recycling, Repurposing • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle • Paul meets Jo. Jo is investigating Whipworm in mouse. • Jo reuses one of Paul’s workflow without change. • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. • Previously a manual two year study by Jo had failed to do this.
Kepler Triana BPEL Trident Meandre Taverna Galaxy
too passé! Not Facebook mySpace for scientists! too open!
A probe into researcher behaviour • Open source (BSD) Ruby on Rails app • REST and SPARQL interfaces, supports Linked Data • Influenced BioCatalogue, MethodBox and SysMO-SEEK • “Facebook for Scientists”...but different to Facebook! • A repository of research methods • A community social network of people and things • A Social Virtual Research Environment myExperiment currently has 309 groups, 2553 workflows, 651 files and 264 packs - see wiki.myexperiment.org
Paul’s Research Object Paul’s Pack Workflow 16 QTL Results produces Included in Published in Included in Feeds into Logs produces Included in Included in Metadata Slides Paper produces Published in Common pathways Workflow 13 Results
method data
SELECT?wf ?uri WHERE { ?wfmebase:has-current-version ?v. ?vmecomp:executes-dataflow ?d. ?dmecomp:has-component ?c. ?crdf:typemecomp:WSDLProcessor. ?cmecomp:processor-uri ?uri. } SELECT?pack ?contrib WHERE { ?pack rdf:typemepack:Pack. ?pack ore:aggregates ?contrib. }
Pack analysis • Paper - source for a paper • Tutorial - tutorial material • Data - collection of data files • Derived data - results of workflow • Benchmark - benchmarking data • Supplementary - stuff associated with a paper • Noise - tests, tryouts, rubbish • Oddity - none of the above Analysis by Sean Bechhofer • Workflow – pack contains a number of workflows • Presentation - encapsulation of a single presentation • Collection - a number of things (workflows/presentations/papers) • Heterogeneous - where the workflows do not appear to have a clear common purpose • Homogeneous - workflows appear to be designed to work together
The R dimensions Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years. Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable. Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed. Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.scilogs.com/eresearch/
Research Object: Title and basic facts Simple status indicators Last execution: Stability: Decay: Annotations: Abstract (250 chars max.) Key resources inside 20 items in this RO, including 3 big workflows and a small pack Aggregated resources (20) Collapsed tabs Evolution . Users’ opinion Resources diagram Reused by 4 users Cited by 3 users Liked by 13 users Popularity
Q. Are we locking into the paper process? Publish then filter – put everything out there, then see what sticks Web-Particle duality – versioning, conservation, preservation
ResearchRecord repeat repeat paper Machine Machine REPRODUCE paper software software software Machine Machine Software REPRODUCE OR REPEAT? paper workflow workflow software software wf Machine Machine Software blogs.scilogs.com/eresearch/
openresearchsoftware.metajnl.com www.scfbm.org
The Executable Thesis new data executablethesis PhD Student new results
Notifications and automatic re-runs Self-repair Autonomic Curation New research? New computer science? Machines are users too
Knowledge Infrastructures Knowledge infrastructures comprise robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds Rethinking knowledge now that the facts aren't the facts, experts are everywhere, and the smartest person in the room is the room
Discussion • Automation versus assistance • Letting humans get on with what they’re best at • Role of narrative and visualisation • The last mile to the brain • Data quality and uncertainty • Data wrangling is significant task today • Provenance, peer-to-peer review? • Responsible Innovation • Who owns the intellectual property? • Who is responsible for damage? • Enabling or preventing a paradigm shift? • Encoding a research paradigm in the infrastructure?
david.deroure@oerc.ox.ac.uk www.oerc.ox.ac.uk/people/dder blogs.scilogs.com/eresearch @dder http://www.myexperiment.org/packs/329
Links • myExperiment project wikihttp://wiki.myexperiment.org/ • Workflow Forever project (Wf4Ever)http://www.wf4ever-project.org/ • Future of Research Communication (FORCE11)http://force11.org/ • Fourth Paradigmhttp://research.microsoft.com/en-us/collaboration/fourthparadigm/
Jun Zhao, Jose Manuel Gomez-Perezy, Khalid Belhajjame, Graham Klyne, Esteban Garcia-Cuestay, AleixGarridoy, Kristina Hettne, Marco Roos, David De Roure, Carole Goble, "Why Workflows Break - Understanding and Combating Decay in Taverna Workflows", accepted for eScience 2012, Chicago, October 2012 Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo Missier, David Newman, Raul Palma, Sean Bechhofer, Esteban GarcCuesta, Jose Manuel Gomez-Perez, Graham Klyne, Kevin Page, Marco, Roos, Jose Enrique Ruiz, StianSoiland-Reyes, Lourdes Verdes-Montenegro, David De Roure and Carole A. Goble, "Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse", SePublica2012 at ESWC2012, Greece, May 2012. Carole A. Goble, David De Roure and Sean Bechhofer, "Accelerating scientists’ knowledge turns". In press for publication in Lecture Notes in Computer Science. Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, DaniusMichaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, “Why linked data is not enough for scientists”, Future Generation Computer Systems De Roure, D., Goble, C. and Stevens, R. (2009) The Design and Realisation of the myExperiment Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems 25, pp. 561-567. doi:10.1016/j.future.2008.06.010 Goble, C.A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., and De Roure, D.: myExperiment: a repository and social network for the sharing of bioinformatics workflows, Nucl. Acids Res., 2010