140 likes | 476 Views
SPRINT. A S imple P arallel R INT erface. Overview. What is SPRINT How is SPRINT different from other parallel R packages Biological example: Post-genomic data analysis Code comparison. SPRINT. S imple P arallel R INT erface ( www.r-sprint.org )
E N D
SPRINT A Simple Parallel RINTerface
Overview • What is SPRINT • How is SPRINT different from other parallel R packages • Biological example: Post-genomic data analysis • Code comparison SPRINT
SPRINT SimpleParallelRINTerface (www.r-sprint.org) “SPRINT: A new parallel framework for R”,J Hill et al, BMC Bioinformatics, Dec 2008. SPRINT
Issues of existing parallel R packages • Difficult to program • Require scientist to also be a parallel programmer! • Require substantial changes to existing scripts • Can’t be used to solve some problems • No data dependencies allowed SPRINT
Biological example • Data: A matrix of expression measurements with genes in rows and samples in columns SPRINT
Biological example • ProblemUsing all or many genes will either crash or be very slow (R memory allocation limits, number of computations) Data limitations (correlations) Work load limitations (permutations) SPRINT
Workarounds and solution • Workaround: • Remove as many genes as possible before applying algorithm. This can be an arbitrary process and remove relevant data. • Perform multiple executions and post-process the data. Can become very painful procedure. • Solution:Parallelisation of R code can be made accessible to bioinformaticians/statisticians.A library with expert coded solutions once, then easy end-point use by all. Big Post Genomic Data SPRINT HPC R Biological Results SPRINT
Benchmarks (256 processes) Data limitations (correlations) Work load limitations (permutations) SPRINT
Correlation code comparison edata <- read.table("largedata.dat") pearsonpairwise <- cor(edata) write.table(pearsonpairwise, "Correlations.txt") quit(save="no") library("sprint") edata <- read.table("largedata.dat") ff_handle <- pcor(edata) pterminate() quit(save="no") SPRINT
Permutation testing code comparison data(golub) smallgd <- golub[1:100,] classlabel <- golub.cl resT <- mt.maxT(smallgd, classlabel, test="t", side="abs") quit(save="no") library("sprint") data(golub) smallgd <- golub[1:100,] classlabel <- golub.cl resT <- pmaxT(smallgd, classlabel, test="t", side="abs") pterminate() quit(save="no") SPRINT
SPRINT • Website: http://www.r-sprint.org/ • Source code can be downloaded from website • Soon also in the CRAN repository • Mailing list: sprint@lists.ed.ac.uk • Contact email: sprint@ed.ac.uk SPRINT
DPM Team: Peter Ghazal Thorsten Forster Muriel Mewissen Numerical Algorithms Group Acknowledgements EPCC Team: • Terry Sloan • Michal Piotrowski • Savvas Petrou • Bartek Dobrzelecki • Jon Hill • Florian Scharinger This work is supported by the Wellcome Trust and the NAG dCSE Support service. SPRINT