240 likes | 359 Views
Programming a service Cloud. Rosa M. Badia , Jorge Ejarque, Daniele Lezzi , Raul Sirvent, Enric Tejedor Grid Computing and Clusters Group Barcelona Supercomputing Center. Cloud Futures Workshop, Redmond, WA, 8-9 April 2010. Outline. StarSs programming model COMPSs framework
E N D
Programming a service Cloud Rosa M. Badia, Jorge Ejarque, Daniele Lezzi, Raul Sirvent, Enric Tejedor Grid Computing and Clusters Group Barcelona Supercomputing Center Cloud Futures Workshop, Redmond, WA, 8-9 April 2010
Outline • StarSs programming model • COMPSs framework • EMOTIVE Cloud • COMPSs towards SOA and Clouds • ServiceSs • Conclusions 2
T10 T20 T40 T30 T50 T11 T21 T41 T31 T51 T12 … Star Superscalar Programming Model Parallel Resources (cluster, grid) Synchronization, results transfer Resource 1 Resource 2 Task selection + parameters direction (input, output, inout) Sequential Application Resource 3 ... for (i=0; i<N; i++){ T1 (data1, data2); T2 (data4, data5); T3 (data2, data5, data6); T4 (data7, data8); T5 (data6, data8, data9); } ... . . . Resource N Scheduling, data transfer, task execution Task graph creation based on data precedence 3
StarSs programming model GRIDSs, COMPSs Tailored for Grids or clusters Data dependence analysis based on files C/C++, Java SMPSs Tailored for SMPs or homogeneous multicores Altix, JS21 nodes, Power5, Intel-Core2 C or Fortran CellSs / GPUSs Tailored for Cell/B.E. processor / for GPUs C or Fortran NestedSs Hybrid approach that combines SMPSs and CellSs
COMPSs • Componentised runtime • Each component in charge of a functionality • Base technologies: • Java as programming language • ProActive: • Reference implementation of the GCM model • Used to build the components • JavaGAT • API that provides uniform access to different kinds of Grid middleware • Used for job submission and file transfer 5
COMPSs Programming model – Application + interface public interface SumItf { @ClassName(“example.Sum") @MethodConstraints(OSType = "Linux") void genRandom( @ParamMetadata(type = Type.FILE, direction = Direction.OUT) String f ); @ClassName(“example.Sum") ... } initialize(f1); for (int i = 0; i < 2; i++) { genRandom(f2); add(f1, f2); } print(f2); Java application Java interface Implementation Task constraints Parameter metadata 6
Custom Java Class Loader Java app code Annotated interface C/C++ app code Interface input input Custom Loader Stubs Generator JNI inserts calls to inserts calls to uses COMPSs runtime Javassist COMPSs runtime
T3 T1 T2 T4 Runtime behavior Java code initialize(f1); for (int i = 0; i < 2; i++) { genRandom(f2); add(f1, f2); } print(f2); Annotated interface Custom Loader Javassist Grids Clusters Files
HMMPfam: sample COMPSs application HMMER: set of tools for protein sequence analysis Based on statistical Hidden Markov Models (HMMs) hmmpfam: tool to compare a sequence against a database of HMMs (protein families) Computationally intensive Embarassingly parallel HMMPfam: Java application that uses hmmpfam Query sequences / database segmentation Programmed in a totally sequential fashion Selection of remote methods using a separate Java interface hmmpfam computation, merging of results 9
HMMPfam – Annotated interface public interface HMMPfamItf { @ClassName("worker.hmmer.HMMPfamImpl") void hmmpfam( @ParamMetadata(type = Type.STRING, direction = Direction.IN) String hmmpfamBin, @ParamMetadata(type = Type.STRING, direction = Direction.IN) String commandLineArgs, @ParamMetadata(type = Type.FILE, direction = Direction.IN) String seqFile, @ParamMetadata(type = Type.FILE, direction = Direction.IN) String dbFile, @ParamMetadata(type = Type.FILE, direction = Direction.OUT) String resultFile ); @ClassName("worker.hmmer.HMMPfamImpl") void mergeSameSeq( @ParamMetadata(type = Type.FILE, direction = Direction.INOUT) String resultFile1, @ParamMetadata(type = Type.FILE, direction = Direction.IN) String resultFile2, @ParamMetadata(type = Type.INT, direction = Direction.IN) int aLimit ); @ClassName("worker.hmmer.HMMPfamImpl") void mergeSameDB( @ParamMetadata(type = Type.FILE, direction = Direction.INOUT) String resultFile1, @ParamMetadata(type = Type.FILE, direction = Direction.IN) String resultFile2 ); } 10
HMMPfam – Main program public static void main(String args[]) throws Exception { split(fSeq, fDB, seqFrags, dbFrags); // Segment the query sequences file, the database file or both (done sequentially) for (String dbFrag : dbFrags) { //Launch hmmpfam for each pair of seq - db fragments for (String seqFrag : seqFrags) { HMMPfamImpl.hmmpfam(hmmpfamBin, finalArgs, seqFrag, dbFrag, output); seqNum++; } dbNum++; } while (outputs.size() > 1) { ListIterator<String> li = outputs.listIterator(); while (li.hasNext()) { String firstOutput = li.next(); String secondOutput = li.hasNext() ? li.next() : null; if (secondOutput == null) break; if (sameSeqFragment(firstOutput, secondOutput)) // Merge output fragments of different db fragments (must take care when merging) HMMPfamImpl.mergeSameSeq(firstOutput, secondOutput, clArgs.getALimit()); else if (sameDBFragment(firstOutput, secondOutput)) // Merge output fragments of different sequence fragments (basically appending one to another) HMMPfamImpl.mergeSameDB(firstOutput, secondOutput); else // Avoid merging two output fragments of different sequence and db fragments li.previous(); } } } 11
HMMPfam – Tasks public static void hmmpfam(String hmmpfamBin, String commandLineArgs, String seqFile, String dbFile, String resultFile) throws Exception { String cmd = hmmpfamBin + " " + commandLineArgs + " “ + dbFile + " " + seqFile; // Execute command line Process hmmpfamProc = Runtime.getRuntime().exec(cmd); // Check the proper finalization of the process int exitValue = hmmpfamProc.waitFor(); if (exitValue != 0) { throw new Exception(“Exit value for hmmpfam is “ + exitValue); } } public static void mergeSameSeq(String resultFile1, String resultFile2, int aLimit) throws Exception { ... } 12
HMMPfam 13
HMMPfam – EBI runs • European Bioinformatics Institute used HMMPfam in productions runs • ELIXIR project, tests on the MareNostrum supercomputer • 7.500.000 protein sequences, divided in 150 files with 50.000 sequences each • TIGRFAM database, containing 3418 models (HMMs) • 150 jobs submitted (i.e. COMPSs-HMMPfam executions), one for each input sequences file • 12 hours of execution time per job, approximately • 64 worker processors per job + 4 processors for the master 15
EMOTIVE CLOUD • EMOTIVE CLOUD – Barcelona Elastic Management of Tasks for Virtualized Environments in the CLOUD • is an open-source software infrastructure for implementing 'cloud computing' on clusters. (recently released v 1.0) • is an open source collaborative software development project dedicated to providing an extensible, standards-based platform to address a broad range of needs in the resource management development space
EMOTIVE Cloud • EMOTIVE architecture: three different layers • Scheduler • Selects where to execute a task • Virtualized Resource Management and Monitoring • VM lifecycle management • Creation of VMs • VM monitoring • VM destruction • Data management • Migration • Checkpointing • Data infrastructure • Distributed file system
SERA scheduler: SRLM and ERA • SRLM • Receives customer requests: job execution • Negotiates the allocation with the resource agents • Selects the resources which match with the job requests • Receives from ERA scheduling proposals to selected resources • Decides which is the best proposal • Manages Execution Lifecycle • Monitorizes the execution, recovers in case of failure, tries to improve the execution • ERA • Perform scheduling proposals • Find schedules for the job requests using the semantic information of the resource descriptions and the provider rules • Interacts with the different resources • Resources reservation • Creates VM for the execution • Submits the jobs on the selected resources ERA Semantic Scheduler SRLM Resource Manager Resources
Integration in a Service-Oriented and Cloud infrastructure • Goal: moving the COMPSs runtime from the client side to a server SOA platform • Characteristics of this environment: • Execution of application tasks offered as services • N applications can be served simultaneously • Several COMPSs can be deployed, to serve the tasks from one or more applications • Resource provisioning brought by a Cloud 19
COMPSs and EMOTIVE Cloud – Step 1 Existing pool of EMOTIVE VMs COMPSs executes tasks on these VMS VM2 VMn VM1
COMPSs and EMOTIVE Cloud – Step 2 The Task Scheduler requests SERA a pool of VMs COMPSs executes tasks on these VMS COMPSs requests the creation of more or “bigger” VMs (memory, CPU, etc) VM2 VMn VM1 VMn+1
COMPSs API COMPSs API COMPSs API External WS Worker ServiceSs envisioned architecture Worker VM 1 COMPSs runtime instance 1 App WS Container Java App Worker VM 1 Worker VM 1 Cloud Cloud Scheduler Runtime Manager WS Container WS Container Java App Worker VM 1 COMPSs runtime instance N Java App Worker VM M WS Container App User Side COMPSs Application Side 22
Conclusions • COMPSs is platform unaware programming model that simplifies the development of applications in distributed environments • Transparent data managemet, task execution • Parallelization at task level • Independent of platform: clusters, grids, clouds • COMPSs evolution on top of SERA and EMOTIVE cloud will enable the execution on federated clouds • SERA is already able to submit jobs to EC2 • Further evolution of COMPSs towards ServiceSs to enable the composition of services • Graphical IDE to help deployment of services and development of applications • Evolved runtime to support new features
www.bsc.es/grid • www.emotivecloud.net