160 likes | 277 Views
GridBLAST: HighThroughput BLAST on the Grid. Arun Krishnan Projects Leader, HPC BII, Singapore arun@bii.a-star.edu.sg http://www.bii.a-star.edu.sg/~arun http://gridblast.bii.a-star.edu.sg. Agenda . GridBLAST Architecture Results Demo Other Projects inGRD: Grid Resource Discovery
E N D
GridBLAST: HighThroughput BLAST on the Grid Arun Krishnan Projects Leader, HPC BII, Singapore arun@bii.a-star.edu.sg http://www.bii.a-star.edu.sg/~arun http://gridblast.bii.a-star.edu.sg
Agenda • GridBLAST • Architecture • Results • Demo • Other Projects • inGRD: Grid Resource Discovery • GridX: Meta-Scheduler for the Grid • Miscellaneous
Local Node Remote Node Remote Node Grid SPMD Architecture Head (Initiating) Node GridBLAST Globus Middleware Results Results local remote Remote Script Remote Script Executables, Databases, Input Files Remote Grid Nodes
NETWORK B NODE 2 NODE 3 ROUTER NODE 1 NETWORK A backbone ROUTER ROUTER Mini-Grid
GridBLAST Main Script Flow Sheet Split Queries Into Multiple Files Static Scheduling Of Queries Tar & Zip Executables, Databases & Query Files Spawn Separate Threads of Execution For Each Remote Node Spawn jobs Using Globusrun Spawn jobs Using Globusrun Spawn jobs Using Globusrun Gather Results & Cleanup
Remote Node Script File Open GASS Server Copy Files,Executables, Databases from Initiating Node Untar & Unzip Blast Executables, Databases & Query Files No Single Processor? Yes Spawn Scatter Server Spawn Blastall Jobs Spawn Scatter Clients using Local Job Managers Queries Done? No Work-queue Scheduler Distributes Queries Yes No Queries Done? Yes Tar Results File and Copy to Initiating Node Cleanup Temp Directories/ Files and Exit
Bound on communication time • A bound on the communication time corresponding to the maximum parallel execution time, for an SPMD type grid application is given by T_comm_Max Bound Speedup Speedup = (TG/TL) Normalized Bound\Comm. Time Problem Size (# of Queries) • The minmax problem can be formulated as: TC_Max_Prop Bound_Prop Tc_Max_Minmax Bound_Minmax Normalized Bound\Comm. Time Problem Size (# of Queries)
SpeedupMinmax Speedup_Prop Minmax Scheduler Speedup for two different schemes Query Distribution across nodes for two different schemes Prop: Node1 Prop: Node2, Node3 Minmax: Node1 Minmax: Node2 Minmax: Node3 Speedup = (TG/TL) Queries/Node Problem Size (# of Queries) Problem Size (# of Queries)
Why inGRD? • Inconsistency in information that MDS can provide. Dependent on Globus GIIS/GRIS configuration by Grid Administrators. • Does not require further installation of sensors on every compute node within a grid node. Makes use of readily available resource information collected by the job managers. • Pre-formatted data on Grid nodes enable faster request, collection and processing of large amounts of data.
inGRD overview • inGRD sensors are installed on Grid nodes to collect available resource information from their compute cluster. • inGRD client applications facilitate the submission of requests and collection of responses from the inGRD enabled Grid nodes. • Results are represented as a single XML document.
GridX: Metascheduler for the Grid • Metascheduler for scheduling jobs in a grid framework • Will provide a user-friendly interface for grid users to submit jobs • Provides Grid resources information by interfacing with inGRD, NWS, Ganglia • Provides basic grid requirements : job submission, monitoring, cancellation, file transferetc • Advanced features include: accounting, load balancing, static and dynamic scheduling strategies
Other Projects • GridGene Project: • High-throughput, grid-enabled version of two different gene-finding applications, GRAIL and GeneWise • Project with GIS: • Parallelization of mass spectrometry code for analysis of proteomics data