1 / 34

David De Roure

Creating Research Objects that contain collections of data, papers and research workflows. David De Roure. “Web as carrier pigeon”. BioEssays ,, 26 (1):99–105, January 2004. http:// research.microsoft.com /en-us/collaboration/ fourthparadigm /. Big Data Big Compute. The Future!.

gisela
Download Presentation

David De Roure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Creating Research Objectsthat contain collections of data, papers and research workflows David De Roure

  2. “Web as carrier pigeon”

  3. BioEssays,, 26(1):99–105, January 2004 http://research.microsoft.com/en-us/collaboration/fourthparadigm/

  4. Big Data Big Compute The Future! Compute & Data Complexity Socialnetworking Conventionalcomputation Social Complexity

  5. http://force11.org/

  6. Outline • The myExperiment experiment • Workflow Forever • Science fiction about science facts

  7. E. Science laboris • Data Analysis Pipelines • Workflows are the new rock and roll • Machinery for coordinating the execution of services and linking together resources • Repetitive and mundane boring stuff made easier

  8. Reuse, Recycling, Repurposing • Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle • Paul meets Jo. Jo is investigating Whipworm in mouse. • Jo reuses one of Paul’s workflow without change. • Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite. • Previously a manual two year study by Jo had failed to do this.

  9. Kepler Triana BPEL Trident Meandre Taverna Galaxy

  10. too passé! Not Facebook mySpace for scientists! too open!

  11. A probe into researcher behaviour • Open source (BSD) Ruby on Rails app • REST and SPARQL interfaces, supports Linked Data • Influenced BioCatalogue, MethodBox and SysMO-SEEK • “Facebook for Scientists”...but different to Facebook! • A repository of research methods • A community social network of people and things • A Social Virtual Research Environment myExperiment currently has 309 groups, 2553 workflows, 651 files and 264 packs - see wiki.myexperiment.org

  12. http://www.myexperiment.org/

  13. Paul’s Research Object Paul’s Pack Workflow 16 QTL Results produces Included in Published in Included in Feeds into Logs produces Included in Included in Metadata Slides Paper produces Published in Common pathways Workflow 13 Results

  14. method data

  15. SELECT?wf ?uri WHERE { ?wfmebase:has-current-version ?v. ?vmecomp:executes-dataflow ?d. ?dmecomp:has-component ?c. ?crdf:typemecomp:WSDLProcessor. ?cmecomp:processor-uri ?uri. } SELECT?pack ?contrib WHERE { ?pack rdf:typemepack:Pack. ?pack ore:aggregates ?contrib. }

  16. Pack analysis • Paper - source for a paper • Tutorial - tutorial material • Data - collection of data files • Derived data - results of workflow • Benchmark - benchmarking data • Supplementary - stuff associated with a paper • Noise - tests, tryouts, rubbish • Oddity - none of the above Analysis by Sean Bechhofer • Workflow – pack contains a number of workflows • Presentation - encapsulation of a single presentation • Collection - a number of things (workflows/presentations/papers) • Heterogeneous - where the workflows do not appear to have a clear common purpose • Homogeneous - workflows appear to be designed to work together

  17. The R dimensions Replayable. Studies might involve single investigations that happen in milliseconds or protracted processes that take years. Referenceable. If research objects are to augment or replace traditional publication methods, then they must be referenceable or citeable. Revealable. Third parties must be able to audit the steps performed in the research in order to be convinced of the validity of results. Respectful. Explicit representations of the provenance, lineage and flow of intellectual property. Reusable. The key tenet of Research Objects is to support the sharing and reuse of data, methods and processes. Repurposeable. Reuse may also involve the reuse of constituent parts of the Research Object. Repeatable. There should be sufficient information in a Research Object to be able to repeat the study, perhaps years later. Reproducible. A third party can start with the same inputs and methods and see if a prior result can be confirmed. Replacing the Paper: The Twelve Rs of the e-Research Record” on http://blogs.scilogs.com/eresearch/

  18. http://wf4ever.github.com/ro-primer/

  19. http://www.wf4ever-project.org/

  20. Research Object: Title and basic facts Simple status indicators Last execution: Stability: Decay: Annotations: Abstract (250 chars max.) Key resources inside 20 items in this RO, including 3 big workflows and a small pack Aggregated resources (20) Collapsed tabs Evolution . Users’ opinion Resources diagram Reused by 4 users Cited by 3 users Liked by 13 users Popularity

  21. Q. Are we locking into the paper process? Publish then filter – put everything out there, then see what sticks Web-Particle duality – versioning, conservation, preservation

  22. ResearchRecord repeat repeat paper Machine Machine REPRODUCE paper software software software Machine Machine Software REPRODUCE OR REPEAT? paper workflow workflow software software wf Machine Machine Software blogs.scilogs.com/eresearch/

  23. openresearchsoftware.metajnl.com www.scfbm.org

  24. The Executable Thesis new data executablethesis PhD Student new results

  25. Notifications and automatic re-runs Self-repair Autonomic Curation New research? New computer science? Machines are users too

  26. Luna De Ferrari

  27. Knowledge Infrastructures Knowledge infrastructures comprise robust networks of people, artifacts, and institutions that generate, share, and maintain specific knowledge about the human and natural worlds Rethinking knowledge now that the facts aren't the facts, experts are everywhere, and the smartest person in the room is the room

  28. Discussion • Automation versus assistance • Letting humans get on with what they’re best at • Role of narrative and visualisation • The last mile to the brain • Data quality and uncertainty • Data wrangling is significant task today • Provenance, peer-to-peer review? • Responsible Innovation • Who owns the intellectual property? • Who is responsible for damage? • Enabling or preventing a paradigm shift? • Encoding a research paradigm in the infrastructure?

  29. david.deroure@oerc.ox.ac.uk www.oerc.ox.ac.uk/people/dder blogs.scilogs.com/eresearch @dder http://www.myexperiment.org/packs/329

  30. Links • myExperiment project wikihttp://wiki.myexperiment.org/ • Workflow Forever project (Wf4Ever)http://www.wf4ever-project.org/ • Future of Research Communication (FORCE11)http://force11.org/ • Fourth Paradigmhttp://research.microsoft.com/en-us/collaboration/fourthparadigm/

  31. Jun Zhao, Jose Manuel Gomez-Perezy, Khalid Belhajjame, Graham Klyne, Esteban Garcia-Cuestay, AleixGarridoy, Kristina Hettne, Marco Roos, David De Roure, Carole Goble, "Why Workflows Break - Understanding and Combating Decay in Taverna Workflows", accepted for eScience 2012, Chicago, October 2012 Khalid Belhajjame, Oscar Corcho, Daniel Garijo, Jun Zhao, Paolo Missier, David Newman, Raul Palma, Sean Bechhofer, Esteban GarcCuesta, Jose Manuel Gomez-Perez, Graham Klyne, Kevin Page, Marco, Roos, Jose Enrique Ruiz, StianSoiland-Reyes, Lourdes Verdes-Montenegro, David De Roure and Carole A. Goble, "Workflow-Centric Research Objects: First Class Citizens in Scholarly Discourse", SePublica2012 at ESWC2012, Greece, May 2012. Carole A. Goble, David De Roure and Sean Bechhofer, "Accelerating scientists’ knowledge turns". In press for publication in Lecture Notes in Computer Science. Sean Bechhofer, Iain Buchan, David De Roure, Paolo Missier, John Ainsworth, Jiten Bhagat, Philip Couch, Don Cruickshank, Mark Delderfield, Ian Dunlop, Matthew Gamble, DaniusMichaelides, Stuart Owen, David Newman, Shoaib Sufi, Carole Goble, “Why linked data is not enough for scientists”, Future Generation Computer Systems De Roure, D., Goble, C. and Stevens, R. (2009) The Design and Realisation of the myExperiment Virtual Research Environment for Social Sharing of Workflows. Future Generation Computer Systems 25, pp. 561-567. doi:10.1016/j.future.2008.06.010 Goble, C.A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., and De Roure, D.: myExperiment: a repository and social network for the sharing of bioinformatics workflows, Nucl. Acids Res., 2010

More Related