270 likes | 429 Views
CAS@home. Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing. outline. CAS@home project Applications: Lammps: dynamical molecular simulation treeThreader: protein structure prediction Remote Job Submission. CAS@HOME.
E N D
CAS@home Wenjing Wu wuwj@ihep.ac.cn Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013 @Grenoble
outline • CAS@home project • Applications: • Lammps: dynamical molecular simulation • treeThreader: protein structure prediction • Remote Job Submission BOINC workshop 2013 @Grenoble
CAS@HOME First and Only Volunteer Project in mainland China Launched in June 2010, hosted by the computer center of IHEP, CAS To support scientific computing from Chinese Academy of Sciences and other Research Institutes Host multiple applications from various research fields, including nanotechnology, bioinformation, physics BOINC workshop 2013 @Grenoble
CAS@home status Ever Since it was launched in June 2010 23K active hosts 1.3 TFLOPS (real time computing power) 10K active users 1/3 are Chinese Peak: 1M/month validated CPU hours 7M CPU hours Since Nov 2012 Hosting 3 applications: Lammps , treeThreader, Aevol Other ongoing applications: BOSS (VBoxwrapper based) BOINC workshop 2013 @Grenoble
Application 1: Lammps • Software for dynamical molecular simulation, widely used by scientists from various research fields. • Restartable, developed in C by an international group, can be compiled on both Windows and Linux with some effort. • Input/output: 3 mandatory input files (<10MB)/ 1 compressed output file (hundreds of MB) • Running time : 0.5 hour to 800 hours (it depends on a random number which decides the steps of the simulation) BOINC workshop 2013 @Grenoble
Problems • Results are numerical, it generates discrepancy for 2 reasons: • float point calculation on different platforms • the checkpoints also cause discrepancy due to losing precision with printing the value to a text file. • Solutions • Homogeneous Redundancy, or Homogeneous Application Version • Running problems: • Some long jobs (~hundreds hours) crash in the middle without getting any credit. BOINC workshop 2013 @Grenoble
Application 2: treeThreader • For Protein structure prediction • Written in C by local scientists, can be compiled easily on both Windows and Linux platform, restartable • Computing task: to compare a protein sequence file against all existing protein templates. • Input files: configuration files, Protein Sequence file, ~50k Protein templates (about 4GB) • Output files: a text file corresponds to a template file • It needs about 42GFLOPS/hour to compare one sequence file against all templates. BOINC workshop 2013 @Grenoble
Computing task Each comparison takes 6s 1 Host Protein Template 1 A Protein sequence Protein Template 2 Protein Template 3 It takes about 84 hours on a single core Protein Template 50,000
Running it on BOINC Host Am Host A2 Host A1 Each comparison takes 6s,each sub package takes 9000s on a host Locality Scheduling (job goes to where the data is) A Protein sequence It takes 9000s (2.5 hours) to finish the task Sub Package 32(sticky file) Sub Package 2(sticky file) Sub Package 1 (sticky file) Protein Template 1501 Protein Template 46501 Protein Template 1 Protein Template 46502 Protein Template 2 Protein Template 1502 Protein Template 48000 Protein Template 1500 Protein Template 3000 Host An Sub Package 14(sticky file) Sub Package 15(sticky file) Sub Package 16(sticky file)
Problems • Long tail batches • There is a front end server which submits batches and does the pre-processing and post processing of the sequence, hence it can only maintain/watch a maximum number of active batches (batches in progress) in parallel (300) • a whole batch is delayed by the slowest job • No new batches will be submitted to the BOINC server due to some batches are still “in progress” (waiting for the slowest jobs) • A lot of hosts end up in “starving” situation BOINC workshop 2013 @Grenoble
Remote Job Submission • CAS@home hosts multiple applications • Each application has multiple users • Application users have no privileges to submit jobs via CAS@home server directly • It requires remote job submission which allows authorized and authenticated users to submit jobs through remote machines. • Basic Remote Job Submission functions: batch submit/check_status/retire/abort/download results • BOINC provides a quite rich set of APIs for remote batch (a set of jobs based on the same input files) operations, but each application still needs its own server side CGI code and client side code for remote job submission • Some operations (Batch retire/abort/status check) are generic, can directly use BOINC API • Other operations like batch submit/results downloading are application specific, need to be customized. • Can add fancy functions as “test running”, “estimate running time” BOINC workshop 2013 @Grenoble
Lammps Job Submission • Jobs are created in batches. • A batch = 1 set of input files + different parameter-value pairs • A batch comprises from hundreds to thousands of jobs • Remote Job Submission: Batches are submitted through a web portal by authenticated and authorized users • Authenticated and Authorized users can “operate” the batches through the web portal (retire, abort, check status, download results) Batch A –(input file1, input file 2) Job 1: Ka1=Va1 Kb1=Vb1 Job 2: Ka2=Va2 Kb2=Vb2 ….. Job N: KaN=VaN KbN=VbN BOINC workshop 2013 @Grenoble
LAMMPS CAS@home File Sandbox Service File Sandbox Test a Job LAMMPS CGI Submit a Batch Check Batch Status Job1: Para List , Value List1 Job2: Para List , Value List2 Job3: Para List , Value List3 …. JobN: Para List , Value ListN Get Output CAS User Interface …
Web Portal Volunteer Hosts Sandbox File1 File2 Syntax check, GLOPS, output size estimation Test a job with chosen input files LAMMPS CGI on CAS@home server http Job Tester http Pass the test http User Submit a batch Batch Creator http http http Batch Operations Batch Monitor Job Monitor http http http Zip Results Operations on Batch Abort/Retire a batch Download Results Volunteer Hosts http BOINC workshop 2013 @Grenoble
BOINC Sandbox Can not repeat uploading a file Can not delete files used by a running batch BOINC workshop 2013 @Grenoble
Lammps Job Testing Lammps Specific ! Submit the batch Test the job to the server BOINC workshop 2013 @Grenoble
Batch Monitoring Admin can see the status of all batches Batch status: In process, Completed, Aborted, Retired BOINC workshop 2013 @Grenoble
Admin all batches BOINC workshop 2013 @Grenoble
Job Status Input files associated with this job Results can be downloaded respectively BOINC workshop 2013 @Grenoble
Batch Operations Can Abort an unfinished batch here Download results of a work unit Download results of this batch Retire a batch BOINC workshop 2013 @Grenoble
TreeThreader job submission • Jobs are created in batches: 1 protein sequence corresponds to 1 batch (32 jobs) • Remote Job Submission: • Client side: provide a set of PHP APIs which allows authenticated and authorized users to submit batches and operate (check status, retire, abort, get output)these batches from remote • Server side: • Generic operations such as batch abort/retire/status check are already included in BOINC code • Operations as batch submission and results downloading are application specific, and implemented in a CGI program on the server side BOINC workshop 2013 @Grenoble
TreeThreader Job Submission CGI • Batch submission • Takes client uploaded the sequence and configuration files • create a batch of jobs based on the input files and all templates files which are already stored on the server side. • Return a Batch ID • Batch result downloading • uncompress all output files of the batch • put uncompressed output files into a same directory and compress it • return the downloading URL of the batch result file BOINC workshop 2013 @Grenoble
TreeThreader Job Submission ICT Web Services Sequence Template P1 Submit a sequence Template P2 Status Check Template P3 API Template P4 Get Output Merged Results … Template P32 TreeThreader CGI CAS@home …
Thoughts on a more generic Job submission interface • Server side still requires specific functions to create batches, merge results, testing, estimation • On client side, can generalize the job submission and results downloading functions • Use an XML file to describe input files, types of input files from the client side BOINC workshop 2013 @Grenoble
<jobdesc> <file info> <number> 0 </number> <type>upload</type> !file needs to be uploaded to BOINC server </file info> <file info> <number> 1 </number> <type>online</type> !file already stored on BOINC server </file info> <file_ref> <file_number>0</file_number> <open_name>MySEQ.tar.gz</open_name> </file_ref> <file_ref> <file_number>1</file_number> <open_name>Templates</open_name> </file_ref> </jobdesc> BOINC workshop 2013 @Grenoble
The End! BOINC workshop 2013 @Grenoble