1 / 13

Privacy issues in integrating R environment in scientific workflows

Privacy issues in integrating R environment in scientific workflows. Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science.

luigi
Download Presentation

Privacy issues in integrating R environment in scientific workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues in integrating Legacy Experiment Environment to Scientific WorkflowsZhiming Zhao, Dmitry A. Vasunin, Adianto Wibisono, Adam Belloum, Cees de Laat, Pieter Adriaans, Bob Hertzberger

  2. Outline • Scientific experiments and R • Problem description • Optional solutions • Experimental results • Summarizing discussion • Future work

  3. Prototype: on small data scale. Define goal Prototype the algorithm Computing (Test with small data) Vis./Int. (Validation) Refine Refine Experiment: on full data scale. Finding & Dissemination Apply to full size data Data analysis Scientific experiments and support systems In such scenarios: • Existing experiment environments, such as R, are widely used by domain scientists • Human in the loop computing is important for testing and validating prototypes • scientific workflows are used to manage different processes and the experiment lifecycle

  4. R and workflow support in VL-e • R realises rich functionality of data statistics and visualisation, and has been used as an important experimental environment in bio-sciences. • R needs scientific workflow support • Accessing different e-Science resources • Being coordinated with the other components in a large scale experiment • E-Science workflows in certain domains also need R • Reuse the advanced results from legacy systems • Support experiments developed on legacy systems • Workflow support in VL-e • Four systems are recommended • Taverna, Kepler and VLAM have support to R • A generic solution is under construction

  5. User Desktop Wf system Socket Remote node S Remote R Env. WS W L Local R Env. R in scientific workflows: current solutions Three types of solutions • Local: local installation of R, through the command line interface of R • Simple configuration • Performance bottleneck • Web Service: SOAP to pass R script and objects • Standard interface, distributed computing • High latency • TCP Socket: socket interface (RServe) • Distributed computing • Maintain states • Poor security

  6. Different levels of privacy issues Data level Intermediate results not to be seen by the other users Communication level: graphical display Remote X display and interaction between multi users Typical scenario of RServe and requirements on privacy WF1 WF2 R Display

  7. Problem description and desired solution • Problem description • Most of the legacy experiment environment do not have strong security management • Workflow systems provide integration without considering security issues • The deployment of remote environment is required to be secure • Desire • Using existing technologies • Provide solutions to privacy issues at workflow level, preferably in a transparent way

  8. Experiments • Review optional solutions • Investigate the overhead of security enhancement on the workflow execution

  9. Different configurations and their level of security

  10. An experiment: Taverna, RServe and security tunnel Experiment • Adding security enhancement in Taverna • Protect the data channels between Taverna and RServe • Overhead • Setting up security tunnels • Runtime data transfer

  11. Summarizing discussion • Integrating existing experiment environment with workflow system is important for rapid prototyping • Privacy issues are demanded by both users and e-Science infrastructure, and can be viewed a generic issue when integrating a user interaction enabled legacy component in workflow • Privacy protection can be achieved at certain level by customizing the workflow execution • Enhancing workflow execution not necessarily gives high penalty on execution

  12. Future work • In the VL-e project, we are developing a bus style generic solution for different workflow systems • Taking the data privacy into account when realizing the interoperability between different workflow systems

  13. Activities • Int’l workshop on “Workflow systems in e-Science”, organized by Zhiming Zhao and Adam Belloum, in the context of ICCS, 2006 Reading University, 2007 Beijing, China. • Proceedings is in LNCS, Springer Verlag. • A special issue will be published in Scientific Programming Journal. • http://staff.science.uva.nl/~zhiming/iccs-wses • Workshop on “Scientific workflows and industrial workflow standards in e-Science”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006. • Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South California) • BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) • Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of California, Davis) • Taverna, Prof. Peter Rice (European Bioinformatics Institute) • WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of Pi4 Technologies) • Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University) • http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm

More Related