340 likes | 559 Views
Advancing, integrating and deploying efficient statistical computing to high-throughput scientific applications. Nagiza F. Samatova (ORNL) Srikanth Yoginath (ORNL) Guruprasad Kora (ORNL) Chongle Pan (UTK/ORNL). SDM AHM @ NC State October 5-7, 2005.
E N D
Advancing, integrating and deploying efficient statistical computing to high-throughput scientific applications • Nagiza F. Samatova (ORNL) • Srikanth Yoginath (ORNL) • Guruprasad Kora (ORNL) • Chongle Pan (UTK/ORNL) SDM AHM @ NC State October 5-7, 2005 Contact: Nagiza Samatova, samatovan@ornl.gov
Outline • About Parallel R (pR) • Motivation • About R and its parallelization efforts • Task and data parallelism with Parallel R (pR) • Initial/Future Deployment across Applications • GIS data analysis with GRASS and pR • Clustered Climate Regimes using pR • Quantitative Proteomics in Biology using pR • High Energy Physics (HEP) with ROOT, R and pR • Towards SDM Components Integration • Kepler scenario with pR • CCA scenario with pR • Web Services scenario with pR • Future Extensions • Generic Parallel Agent for ease of pR extension • Task Parallel Engine optimized for data-intensive analyses
Tera-(Flop & Byte) Analyses Could Be Routine for Scientific Applications But… Hits 1Tflop/sec Algorithmic Complexity: Calculate meansO(n) Calculate FFTO(n log(n)) Calculate PCAO(r• c) Hierarchical clust. O(n2) • Climate • Now: 20-40TB per simulated year • 5 yrs: 100TB/yr 5-10PB/yr • Astrophysics • Now and 5 yrs: Can soak up anything! • Fusion • Now: 100Mbytes/15min • 5 yrs: 1000Mbytes/2 min
> library(mva) > pca <- prcomp(data) > summary(pca) > … > dyn.load( “foo.so”) > .C( “foobar” ) > dyn.unload( “foo.so” ) snow API Statistical Computing with R • About R (http://www.r-project.org/): • R is an Open Source (GPL), most widely used programming environment for statistical analysis and graphics; similar to S. • Provides good support for both users and developers. • Highly extensible via dynamically loadable add-on packages. • Originally developed by Robert Gentleman and Ross Ihaka. > library (rpvm) > .PVM.start.pvmd () > .PVM.addhosts (...) > .PVM.config () Towards Enabling Parallel Computing in R: • Rmpi(Hao Yu): R interface to LAM-MPI. • rpvm (Na Li and Tony Rossini): R interface to PVM; requires knowledge of parallel programming. • snow (Luke Tierney): general API on top of message passing routines to provide high-level (parallel apply) commands; mostly demonstrated for embarrassingly parallel applications .
Motivation behind Parallel R (pR) • Ideal Programming Requirements: • Be able to use existing high level (i.e. R) code • Require minimal extra efforts for parallelizing • Have Identical/similar (presumably easy-to-use) interface to R’s • Be able to test codes in sequential settings • Provide efficient and scalable (in terms of problem size and number of processors) performance
Task-parallel analyses: Data-parallel analyses: • Likelihood Maximization. • Re-sampling schemes: Bootstrap, Jackknife, etc. • Animations • Markov Chain Monte Carlo (MCMC). • Multiple chains. • Simulated Tempering: running parallel chains at different “temperature“ to improve mixing. • k-means clustering • Principal Component Analysis (PCA) • Hierarchical (model-based) clustering • Distance matrix, histogram, etc. computations Providing Task and Data Parallelism in pR
Parallel R (pR) Distribution http://www.ASPECT-SDM.org/Parallel-R • Releases History: • pR enables both data and task parallelism (includes task-pR and RScaLAPACK) (version 1.8.1) • RScaLAPACK provides R interface to ScaLAPACK with its scalability in terms of problem size and number of processors using data parallelism (release 0.5.1) • task-pR achieves parallelism by performing out-of-order execution of tasks. With its intelligent scheduling mechanism it attains significant gain in execution times (release 0.2.7) • pMatrix provides a parallel platform to perform major matrix operations in parallel using ScaLAPACK and PBLAS Level II & III routines • Also: Available for download from R’s CRAN web site (www.R-Project.org) with 37 mirror sites in 20 countries
About GRASS (grass.itc.it) • GRASS (Geographic Resources Analysis Support System) is a raster/vector GIS, image processing system, and graphics production. • GRASS contains over 350 programs and tools to render maps and images on monitor and paper; manipulate raster, vector, and sites data; process multi spectral image data; create, manage, and store spatial data. • It is Free (Libre) Software/Open Source released under GNU GPL.
Subtract background noise from data • Generate Covariance Chromatogram • Apply Savitzky-Golay Smoother • Calculate cut-off for search • Find Window with Max. SN ratio • …..
Ratio Calculations with Parallel R Serial Version: ::::::::::: chroList<-list.files(pattern="*.chro"); cat ("Chro", "samSN", "refSN", "PPCSN", "HR", "PCA", "PCASN", file="Pratio-Peptide.txt"); for (i in 1:length(chroList)) { currResult [i] Rratio(filename=chroList[i]); } for (i in 1:length(chroList)) { cat (chroList[i], currResult$samSN, … file="Pratio-Peptide.txt"); } ::::::::::: ::::::: chroList<-list.files(pattern="*.chro"); cat ("Chro", "samSN", "refSN", "PPCSN", "HR", "PCA", "PCASN", file="Pratio-Peptide.txt"); PE ( for (i in 1:length(chroList)) { currResult [i] Pratio(filename=chroList[i]); } ) for (i in 1:length(chroList)) { cat (chroList[i], currResult$samSN, … file="Pratio-Peptide.txt"); } ::::::::::::: Parallel Version:
R in High Energy Physics (HEP, Root) (Adam Lyon,Fermi Lab) modpollux=nls(speed ~ alpha*(1-alpha*beta/(alpha*beta+mb)), data= client[pollux,], start=c(alpha=2.0, beta=0.50), trace=T) library(lattice) d=read.table("data.dat",head=T) w=data[data$event=="OpenFile",] w$min = w$dur/60.0 bwPlot( fromStation ~ min | station, data=w, subset=(min<60), xlab= "Minutes",main="Wait Time for …" )
R in High Energy Physics (HEP, Root) • In HEP, we typically run successive skims to reduce the data size (601 TB down to 100s of Meg or a few Gig) • R seems to want to hold everything in memory • If data is small enough, bring it into R • If can reduce data to something R can hold, bring that subset of data into R -- have the full power of R • If can't even do above, then have some R apparatus to read in data one row at a time and update an R object (e.g. histograms) [But you don't get the full power of R] • Rprof significantly sped up fit (replace ^) • Lessons learned: • We explore using R, a statistical analysis package from the statistics community, in an HEP enviornment • R has already proven useful for analyzing monitoring and benchmarking data • We have ideas on how R can be used to read large datasets • We've done some "proof of principle" studies of physics analysis with R • As we learn more about R, we expect to be more surprised at its capabilities (Adam Lyon,Fermi Lab)
no interest from R community in non-I/O functions of Root In order of work required : 1) R and Root remain separate-- use the more appropriate tool for the task. Use text files to communicate between the two if necessary. 2) Root loads R's math and low level statistical libraries as shared objects Minimalist approach for some functionality Some access to the math and statistics C code functions from R These C functions take basic C types, so no translation necessary But: no upper level functions written in the R language available 3) R and Root remain separate, but: R package to read Root Trees directly into R data frames. Still use best tool for particular task Now easier to get HEP data into R 4) Allow calling of selected high level R functions from within Root Root runs the R interpreter translation is necessary R functions: understand Root objects Root: understand R return objects Expose only some R functions may reduce amount of translation Options for R and Root Interfacing (Adam Lyon,Fermi Lab)
5) R prompt from the Root prompt R needs seamless knowledge of objects in current Root session At end of R session, new R variables translated into Root objects Root runs the R interpreter Translation for all types of Root variables into R and all types of R variables returned to Root. A major undertaking 6) Root prompt from within R Harder than 5: R is C but Root is C++ I don't see much interest in this Things get interesting starting at 3) I have a version 0.0.1 prototypefor reading Root trees into R. Required for all options above 3. I’ll try to work on this as time permits Both Root and R interface to Python Translate with Python as intermediary? Not sure if that's performant enough More Advanced Integration Options (Adam Lyon,Fermi Lab)
KEPLER/R/pR Integration Kepler has a mathematical operations actor RExpression to virtually access/plot any function/graphs from R/pR. RScript “Xread.table(DataFileName) library(RScaLAPACK) asva.prcomp(X) summary(a) plot(a$loadings) … process=runtime.exec(Command); System.out.println(process) … RExpressionactor by Dan Higgins, Kepler/SEEK
CCA Common Component Architecture CCA-based Component Integration • Might be used if there is a need: • To effectively develop/deploy/reuse parallel R (pR). • To expose the power of pR and R to the outside world. • To easily access task and data parallel modules of pR by applications written in different languages. • To integrate pR modules with other technologies of SDM center (e.g. SciRUN2). • To provide a usage model that hides implementation details. • To have plug-n-play flexibility in deployment.
R Library algorithms.sidl driver.sidl Babel Compiler Driver_SVD_Impl SVDFunction_Impl libsvd.so Babel Runtime Componentizing Singular Value Decomposition function of pR/R Lots of THANKS to Jim Kohl & Wael Elwasif for their help!!! Four Easy Steps: • Define component’s public interface in SIDL. • Generate native language stub code using Babel • Add in function and driver implementation. • Compile and link with Babel and R libraries runtime libraries.
1. Create SVD’s interface in SIDL package algorithms version 1.0 { class SVDFunction implements function.FunctionPort, gov.cca.Component { void rsvd( in int iSize ); // gov.cca.Component methods: void setServices(in gov.cca.Services servicesHandle) throws gov.cca.CCAException; } } algorithms.sidl driver.sidl package drivers version 1.0 { class SVDDriver implements gov.cca.ports.GoPort, gov.cca.Component { int go(); void setServices(in gov.cca.Services services) throws gov.cca.CCAException; } }
... /* SVDFunction - rsvd() */ /* postscript() */ PROTECT(fun = Rf_findFun(Rf_install("postscript"), R_GlobalEnv)); eval(e, R_GlobalEnv); ... /* svd(rnorm(iSize),1,1) */ fun = Rf_findFun(Rf_install("svd"), R_GlobalEnv); ... /* svd( 1:10, 1, 1 )*/ ret = eval(e, R_GlobalEnv); ... fun = Rf_findFun(Rf_install("plot"), R_GlobalEnv); ... /* plot( svd$u )*/ eval(e, R_GlobalEnv); Babel Compiler babel –s C algorithms.sidl babel –s C driver.sidl No R scripts – Direct C Efficiency! 2-3. Generate Native Language Stub Code using Babel & Add Implementations … /* Driver_SVD_Impl */ struct algorithms_SVD__data *psvd; /* Access private data structure */ pd = algorithms_SVD__get_data(self); … algorithms_SVDPort function; /* Get the function pointer. */ … iReturn = function->rsvd(i); …
Ccaffeine GoPort SVDPort SVDPort Driver SVDFunction Ready for Deployment of R’s SVD!
Climate Fusion Biology Custom or Native R Interface Custom or Native R Interface Custom or Native R Interface Web Service based Integration of pR Parallel-R [pR] Service Task-Parallel job Task-pR Internet Parallel-solve RScaLAPACK Other parallel libraries through Generic PA Parallel-kmeans Might be used if there is a need: • To provide easy access to HP parallel statistical computations. • To make pR a loosely coupled component that can be dynamically composed. • To initiate & control parallel analyses from a less powerful machine. Request- Response ports defined by WSDL
a b c c c Enabling Parallel R as a Web Service using Apache-AxisC++: SVD Example R Library SVD.wsdl Java Compiler SVDService_impl libsvdServer.so libsvdClient.so RSVDInterface_Impl Axis Runtime Library Three Major Stages: • Create SVD’s interface in WSDL • Develop and deploy the SVD Service Object (libsvdServer.so). • Develop the SVD Client Interface (libSVDClient.so).
1. Create SVD’s interface in WSDL <!-- Message types: define input and output types --> - <wsdl:message name="GetSVDRequest"> <wsdl:part name=“inputVectort" type="xsd:double" /> </wsdl:message> - <wsdl:message name="GetSVDResponse"> <wsdl:part name="SVDOut" type="xsd:double" /> </wsdl:message> <!-- Port type: define a SOAP operations --> - <wsdl:portType name="SVDService"> - <wsdl:operation name="GetSVD" parameterOrder=“inputVector"> <wsdl:input message="local:GetSVDRequest" name="GetSVDRequest" /> <wsdl:output message="local:GetSVDResponse" name="GetSVDResponse" /> </wsdl:operation> </wsdl:portType> <!-- Binding: Bind a location in this service to an operation --> - <wsdl:binding name="SVDSoapBinding" type="local:SVDService"> <wsdlsoap:binding style="rpc" transport="http://schemas.xmlsoap.org/soap /http" /> <wsdl:operation name="GetSVD"> <wsdlsoap:operation soapAction="SVDService#GetSVD" /> - <wsdl:input name="GetSVDRequest"> <wsdlsoap:body encodingStyle="http://schemas.xmlsoap.org/soap/encoding/" namespace="http://localhost/axis/SVD" use="encoded" /></wsdl:input> - <wsdl:output name="GetSVDResponse"> <wsdlsoap:body encodingStyle="http://schemas.xmlsoap.org/soap/ encoding/" namespace="http://localhost/axis/SVD" use="encoded" /> </wsdl:output></wsdl:operation></wsdl:binding> <!-- Service name: define the name and URL of this service --> - <wsdl:service name="SVDService"> - <wsdl:port binding="local:SVDSoapBinding" name="SVDService"> <wsdlsoap:address location="http://localhost/axis/SVDService" /> </wsdl:port> </wsdl:service> </wsdl:definitions>
AxisServiceException.cpp AxisServiceException.h a b c c SVDService.h 2. Develop the SVD Service Object SVD.wsdl java org.apache.axis.wsdl.wsdl2ws.WSDL2Ws -lc++ -sserver –oserver ./SVD.wsdl Server related cpp and corresponding header files • Comments: • WSDL file is processed to create cpp files. • The cpp file functions are modified to compute R svd on the input data. • A shared library libsvdServer.so is built using gcc. • This library is used by Apache-axis to service the user’s request. SVDService.cpp … xsd__double SVDService: :GetSVD(xsd__double Value[]) { ……. /* svd(rnorm(iSize),1,1) */ fun = Rf_findFun(Rf_install("svd"), R_GlobalEnv); ... /* svd( 1:10, 1, 1 )*/ ret = eval(e, R_GlobalEnv); ... return (svd$d) } libsvdServer.so gcc –shared –o libsvdServer.so *.cpp -lR -L$AXISCPP_DEPLOY/lib
a b c c 2. Develop the SVD Client Interface SVD.wsdl java org.apache.axis.wsdl.wsdl2ws.WSDL2Ws -lc++ -sclient –oclient ./SVD.wsdl Server related cpp and corresponding header files • Comments: • WSDL file is processed to create cpp files • A RSVDInterface.cpp file is created to get the input and service detail from R and request for required service. • A shared library libsvdClient.so is built using gcc. • This library is loaded into R environment. • The service is requested through R interface. RSVDInterface.cpp …. SEXP RSVDInterface(SEXP InputVector) { SVDService srv; SEXP retval; int size; …. PROTECT(allocVector(REALSXP,size); …. retVal =srv.GetSVD(InputVector) ; UNPROTECT(1); return retVal; } SVDService.cpp SVDService.h libsvdClient.so gcc –shared –o libsvdClient.so *.cpp -lR -L$AXISCPP_DEPLOY/lib
APACHE Web Server 80 R-Client Environment pR Service > dyn.load(libSVDClient.so) > RSVDInterface(x) libSVDClient.so InputVector Axis-WebService Module Results Results libSVDServer.so 1 2 3 4 N-tier Architecture for pR-Service and R-Client Interaction • pR Service: • <service name=“SVDService" provider="CPP:RPC" description="Simple SVD Service"> • <parameter name="allowedMethods" value="GetSVD "/> • <parameter name="className" value="/usr/local/axiscpp_deploy/lib/libServersvd.so" /> • </service> • Above XML statements are added in axis “server.wsdd” file. • It specifies axis to deploy a service called SVDService and accept calls for method GetSVD • R Client: • Meanwhile the R user • Loads the libSVDClient.so into his environment • Calls the higher-level R-function to access the SVDService.
Third Party Parallel Codes R Environment Parallel Agent RScaLAPACK ScaLAPACK pMatrix Matrix Robject pAlok Parallel k-means Alok’s Data Mining C/Fortran MPI • Define R function parameters & returns • Map R functions to defined function interfaces • Define the function interfaces • Set parallel environment limits for your functions • Define data distribution function (Optional) • Convert your MPI/PVM routine(s) into a set of functions. • Create a shared library of your functions. • Place it in a predefined location. Extensibility of Parallel R (pR)
Generic Parallel Agent for Ease of Extension Goal: To provide a framework to easily add different parallel MPI algorithms to the R-environment. • Features: • It is an interface that allows third party MPI routines to be called from within ParallelR environment. • Users can readily plug-in and use their MPI codes along with ParallelR provided routines. • It is MPI implementation independent. • No knowledge of R or ParallelR is required to get MPI codes included in to ParallelR. • A simple XML schema for generating interface definitions. • Different classes of MPI routines can be included in to the ParallelR environment. • Status: • Completed the design phase
Task Parallel Engine Optimized for Data-Intensive Analyses Tasks Goal: To provide a generic task-parallel engine that would be capable of optimizied tasks scheduling, monitoring, and resource mapping. • Jointly with Dr. Xiaosong Ma (NC State) and her student, Jain Li • Current Progress: • Research phase, semi-complete design document
Parallelize independent tasks – task parallelism Task itself can be parallelized – data parallelism Overall goal – Given a system, minimize parallel R’s execution time Parallelization and Scheduling Task 1 Task 2 Task 3 Task 4 Task 5
worker 1 collect comm assign comm collect worker 2 master assign comm . . . assign collect worker n Problems and Challenges • To detect independent tasks • Data dependency analysis (NP-hard) • To schedule independent tasks • Granularity • Load balance • Data locality • To perform resource allocation • Data locality • Minimize communications or I/O Scheduling & Resource allocation problem (NP-hard): Given a sequence of tasks t1, t2, …, tn, each task has data size di and requires cicomputation time and may require more than one processor. There is network latency lito transfer data size di and overhead oi to initialize the transfer. Our objective is to minimize total execution time of tasks sequence while preserving dependence relations and subject to P processors available.
Integrated Data & Task Parallelism • To parallelize a single task, use third-party package (e.g., SUIF, Polaris) • Task scheduling subject to limited resources, e.g., worker processors • Need to estimate task runtime • Performance database • Runtime performance monitoring