130 likes | 253 Views
Privacy issues in integrating R environment in scientific workflows. Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science.
E N D
Privacy issues in integrating R environment in scientific workflows Dr. Zhiming Zhao University of Amsterdam Virtual Laboratory for e-Science Privacy issues in integrating Legacy Experiment Environment to Scientific WorkflowsZhiming Zhao, Dmitry A. Vasunin, Adianto Wibisono, Adam Belloum, Cees de Laat, Pieter Adriaans, Bob Hertzberger
Outline • Scientific experiments and R • Problem description • Optional solutions • Experimental results • Summarizing discussion • Future work
Prototype: on small data scale. Define goal Prototype the algorithm Computing (Test with small data) Vis./Int. (Validation) Refine Refine Experiment: on full data scale. Finding & Dissemination Apply to full size data Data analysis Scientific experiments and support systems In such scenarios: • Existing experiment environments, such as R, are widely used by domain scientists • Human in the loop computing is important for testing and validating prototypes • scientific workflows are used to manage different processes and the experiment lifecycle
R and workflow support in VL-e • R realises rich functionality of data statistics and visualisation, and has been used as an important experimental environment in bio-sciences. • R needs scientific workflow support • Accessing different e-Science resources • Being coordinated with the other components in a large scale experiment • E-Science workflows in certain domains also need R • Reuse the advanced results from legacy systems • Support experiments developed on legacy systems • Workflow support in VL-e • Four systems are recommended • Taverna, Kepler and VLAM have support to R • A generic solution is under construction
User Desktop Wf system Socket Remote node S Remote R Env. WS W L Local R Env. R in scientific workflows: current solutions Three types of solutions • Local: local installation of R, through the command line interface of R • Simple configuration • Performance bottleneck • Web Service: SOAP to pass R script and objects • Standard interface, distributed computing • High latency • TCP Socket: socket interface (RServe) • Distributed computing • Maintain states • Poor security
Different levels of privacy issues Data level Intermediate results not to be seen by the other users Communication level: graphical display Remote X display and interaction between multi users Typical scenario of RServe and requirements on privacy WF1 WF2 R Display
Problem description and desired solution • Problem description • Most of the legacy experiment environment do not have strong security management • Workflow systems provide integration without considering security issues • The deployment of remote environment is required to be secure • Desire • Using existing technologies • Provide solutions to privacy issues at workflow level, preferably in a transparent way
Experiments • Review optional solutions • Investigate the overhead of security enhancement on the workflow execution
An experiment: Taverna, RServe and security tunnel Experiment • Adding security enhancement in Taverna • Protect the data channels between Taverna and RServe • Overhead • Setting up security tunnels • Runtime data transfer
Summarizing discussion • Integrating existing experiment environment with workflow system is important for rapid prototyping • Privacy issues are demanded by both users and e-Science infrastructure, and can be viewed a generic issue when integrating a user interaction enabled legacy component in workflow • Privacy protection can be achieved at certain level by customizing the workflow execution • Enhancing workflow execution not necessarily gives high penalty on execution
Future work • In the VL-e project, we are developing a bus style generic solution for different workflow systems • Taking the data privacy into account when realizing the interoperability between different workflow systems
Activities • Int’l workshop on “Workflow systems in e-Science”, organized by Zhiming Zhao and Adam Belloum, in the context of ICCS, 2006 Reading University, 2007 Beijing, China. • Proceedings is in LNCS, Springer Verlag. • A special issue will be published in Scientific Programming Journal. • http://staff.science.uva.nl/~zhiming/iccs-wses • Workshop on “Scientific workflows and industrial workflow standards in e-Science”, organized by Adam Belloum and Zhiming Zhao, in the context of IEEE e-Science and Grid computing conference in Amsterdam December 2006. • Pegasus, Dr. Ewa Deelman (Department of Computer Science University of South California) • BPEL, Dr. Dieter König (IBM Research Germany Development Laboratory) • Kepler, Dr. Bertram Ludäscher (Department of Computer Science University of California, Davis) • Taverna, Prof. Peter Rice (European Bioinformatics Institute) • WS and Semantic issues, Dr. Steve Ross-Talbot (CEO, and a co-founder, of Pi4 Technologies) • Triana, Dr. Ian J. Taylor (Department of Computer Science Cardiff University) • http://staff.science.uva.nl/~adam/workshop/VL-e-workshop.htm