360 likes | 567 Views
COMP Superscalar Tutorial. Javier Álvarez, Rosa M. Badia, Jorge Ejarque, Daniele Lezzi, Francesc Lordan, Roger Rafanell, Raül Sirvent, Enric Tejedor {FirstName.LastName}@bsc.es. Life SciencesTutorial April 16, 2012. Tutorial Outline.
E N D
COMP Superscalar Tutorial Javier Álvarez, Rosa M. Badia, Jorge Ejarque, Daniele Lezzi, Francesc Lordan, Roger Rafanell, Raül Sirvent, Enric Tejedor {FirstName.LastName}@bsc.es Life SciencesTutorial April 16, 2012
Tutorial Outline Introductionto COMP Superscalar (COMPSs), 11:00 - 12:15 • Overview → Raül • Programmingmodel → Enric • Steps • Data types (Files, Objects) • Tasktypes (Methods, Services) • Firstexample – Simple app -Short Break- Demos, 12:30 - 14:00 • Demos • Regular app, Methodtasks – HMMPfamapp, Eclipse → Daniele • Serviceapp, Servicetasks – Gene Detectionapp, IDE → Jorge -Lunch Break-
Tutorial Outline Hands-on, 15:00 – 17:00 • Hands–on: Discrete app (code provided, modify interface) • App description, example of task selection → Javi • Configuration, compilation and execution → Roger/Daniele • Local VM (basic steps) • Monitoring and debugging → Francesc • Extra examples: BLAST → Francesc/Roger • Feedback → Raül
Introduction to COMPSs • Overview • Objective • Data dependencies • Task graph generation • Runtime features • Programming model • Steps • Data types (Files, Objects) • Task types (Methods, Services) • First example – Simple app • Configuration, compilation and execution (Simple) 1
COMPSs Objective • Reduce the development complexity of Grid/Cluster/Cloud applications to the minimum • Writing an application for a computational distributed infrastructure may be as easy as writing a sequential application • Target applications: composed of tasks, most of them repetitive • Granularity of the tasks of the level of simulations or programs • Data: files, objects, arrays, primitive types 2
COMPSs Overview – Data dependencies for (int i = 0; i < MAXITER; i++) { newBWd = GenerateRandom(); subst (referenceCFG, newBWd, newCFG); dimemas (newCFG, traceFile, DimemasOUT); post (newBWd, DimemasOUT, FinalOUT); if(i % 3 == 0) Display(FinalOUT); } Input/output data 3
Subst Subst Subst DIMEMAS Subst DIMEMAS Subst Subst DIMEMAS EXTRACT DIMEMAS EXTRACT … DIMEMAS DIMEMAS EXTRACT EXTRACT EXTRACT EXTRACT Display Grid Display Subst DIMEMAS EXTRACT COMPSs Overview – Task graph generation 4
Grid COMPSs Overview – Task graph generation Subst Subst Subst DIMEMAS Subst DIMEMAS Subst Subst DIMEMAS Subst EXTRACT DIMEMAS EXTRACT … DIMEMAS DIMEMAS EXTRACT DIMEMAS EXTRACT EXTRACT EXTRACT EXTRACT Display Display 5
COMPSs Overview - Runtime features • Supported features: • Data dependency analysis • Data renaming • Data transfer • Task scheduling • In progress • Shared disks management • Checkpointing • Resource management • Results collection • Fault tolerance 6
T10 T20 T40 T30 T50 T11 T21 T41 T31 T51 T12 … Programming Model - Main Idea Parallel Resources (a) Task selection + parameters direction (input, output, inout) Resource 1 Sequential Code ... for (i=0; i<N; i++){ T1 (data1, data2); T2 (data4, data5); T3 (data2, data5, data6); T4 (data7, data8); T5 (data6, data8, data9); } ... (d) Task completion, synchronization Resource 2 . . . (b) Task graph creation based on data dependencies Resource N (c) Scheduling, data transfer, task execution 7
Programming model - Steps • Selecting the tasks • Regular Java methods • External Services: SOAP WS operations 2 basic steps • Writing the application • Programmed as a sequential code • No API • Automatic substitution of task calls / synchronization 8
Programming model – Task Selection Interface public interface SampleItf { @Method(declaringClass = “servicess.Example”) void processReply( @Parameter(direction = INOUT) Reply r ); @Service(namespace = “http://servicess.es/example”, name = “SampleService”, port = “SamplePort”) Reply runQuery( @Parameter(direction = IN) Query q ); } 9
Programming model – Main program public class App { public static void main(String[] args) { Query query = new Query(…); Reply reply = runQuery(query); processReply(reply); reply.printToLog(); } } Service task call Method task call runQuery processReply Synchronization 10
Programming model – Service Main program public class ServiceApp { @Orchestration public static void sampleComposite() { Query query = new Query(…); Reply reply = runQuery(query); processReply(reply); reply.printToLog(); } } runQuery processReply 11
Programming model – Sample application public static void main(String[] args) { String counter1 = args[0], counter2 = args[1], counter3 = args[2]; initializeCounters(counter1, counter2, counter3); for (i = 0; i < 3; i++) { increment(counter1); increment(counter2); increment(counter3); } } Main program Subroutine public static void increment(String counterFile) { int value = readCounter(counterFile); value++; writeCounter(counterFile, value); } 12
Programming model – Sample app (interface) Task selection interface public interface SimpleItf { @Method(declaringClass = “SimpleImpl") void increment( @Parameter(type = FILE, direction = INOUT) String counterFile ); } Implementation Parameter metadata 13
Programming model – Final app code public static void main(String[] args) { String counter1 = args[0], counter2 = args[1], counter3 = args[2]; initializeCounters(counter1, counter2, counter3); for (i = 0; i < 3; i++) { increment(counter1); increment(counter2); increment(counter3); } } Main program of the application NO CHANGES! 14
Programming model – Task graph Main loop for (i = 0; i < 3; i++) { increment(counter1); increment(counter2); increment(counter3); } Task graph counter1 counter2 counter3 1st iteration 2nd iteration 3rd iteration 15
Demos • Regular application + Method tasks • HMMPfam application • Development with Eclipse • Service application + Service tasks • Gene Detection application • Development with IDE 16
Demos: HMMPfam Application: HMMER suite (hmmpfam) • hmmpfam is part of the HMMER suite: set of tools for protein sequence analysis • Reads a sequences file and compares each sequence in it against a database of HMMs • HMM (Hidden Markov Model): statistical figure that represents a protein family • Goal: create an hmmpfam efficient service • Starting point: sequential version of the hmmpfam tool • With the COMPSs PM: hmmpfam becomes parallel • Phase 1: Split both input sequences and database • Phase 2: Process them in parallel (speed up execution) • Phase 3: Reduction of results 17
Demos: Gene Detection Service • Gene Detection algorithm designed by the Life Science team • Original code was programmed in Perl and using BIO MOBY services • Combine services with computations • Example for showing different capabilities of COMPSs • Publish the composition as a Service 18
Demos: Gene Detection • Present the IDE (already in development) • Easy implementation of services with COMPSs. • Show how to implement a composition of services • Stateless and stateful invocations of services • Implementation of Part B • Show how to implement a composition of static and object methods with files and objects as parameters • Implementation of Part D • Show how to publish the composition as a Service 20
Hands-on • Hands–on: Discrete application • App description, example of task selection • Configuration, compilation and execution • Local VM (basic steps) • Monitoring and debugging • Extra examples: BLAST • Feedback 21
Discrete: Overview • Having 100 protein structures → find out the best combination of FVDW, FSOLV and EPS (i.e. configuration) for Discrete. • Given a configuration, we run Discrete for each structure and evaluate the performance of that configuration. • S configurations means S · 100simulations (a lot!). 22
Discrete: Application Workflow (1) receptor N coordinates N structure file N PDBtoDISCRETE.pl DMDSetup topology N ligand N done once for each structure file (N=1..100) 23
Discrete: Application Workflow (2) FVDW=i FSOLV=j EPS=k 0 < i ≤ 2 0 < j ≤ 7 0 < k ≤ 5 energy N (i, j, k) coordinates N trajectory energy N (i, j, k) readsnap promig3col discrete average N (i, j, k) trajectory N (i, j, k) topology N score (i, j, k) average 1 (i, j, k) 100 discrete executions for each (i, j, k) combination … evaluation average 100 (i, j, k) coefficient (i, j, k) lower = better 24
Discrete: Task selection and invocation • Complete theapplicationcode • Selectthetask • Performthetaskcall 25
Discrete: Configuration, compilation and execution • Compilation • cd /home/user/workspace/discrete/src/discrete • javac * • Execution • cp /home/user/workspace/discrete/jar/discrete.jar /home/user • export CLASSPATH=$CLASSPATH:/home/user/discrete.jar • runcompssdiscrete.Discrete true /home/user/workspace/discrete/binary /sharedDisk/Discrete/data /sharedDisk/Discrete/1B6C /tmp /sharedDisk/Discrete/scores 26
Discrete: Monitoring • The runtime of COMPSs provides some information at execution time so the user can follow the progress of the application • Current graph: monitor.dot • gen_currentgraph.sh ~/monitor.dot • Stats of the application run: monitor.xml • number of tasks • resource usage • execution time of each core • The monitoring frequency can be configured • 20 seconds for this tutorial 27
Discrete: Debugging • COMPSs can be run in debug mode for it to show more information about the execution and detect possible problems • Activated for this tutorial • The user can check the execution of its application by reading: • The output/errors of the main application • On the standard output/error of the launch script (console) • The output/error of a task # N • $WORKING_DIR/jobN.out / $WORKING_DIR/jobN.err • Messages from the runtime components of COMPSs • $HOME/compss.log • The user can verify the correct structure of the parallel application with a complete application graph generated post-mortem • gen_graph.sh $HOME/APP_NAME.dot 28
Extra examples • BLAST Sequences Reference db Split Blast Blast Blast runcompss blast.Blast true /home/user/workspace/blast/binary/blastall /sharedDisk/Blast/databases/swissprot/swissprot /sharedDisk/Blast/sequences/sargasso_test.fasta 4 /tmp/ /home/user/IT/blast.Blast/out.txt -v 10 -b 10 -e 1e-10 Assembly Output 29
Feedback • Commentsfromtheattendees 30