210 likes | 385 Views
GridQTL : Using the NGS to map genes through a web portal. Jean-Alain Grunchec University of Edinburgh. Plan. GridQTL Team and users Introduction to the GridQTL project Description of computing infrastructure and software behind the scene Short demonstration of the Grided service.
E N D
GridQTL : Using the NGS to map genes through a web portal Jean-Alain Grunchec University of Edinburgh
Plan • GridQTL Team and users • Introduction to the GridQTL project • Description of computing infrastructure and software behind the scene • Short demonstration of the Grided service
GridQTL Team * Sara Knott +Ian White × Jules Hernandez-Sanchez #Jean-Alain Grunchec # Kashif Saleem * Chris Haley * Dirk-Jan de Koning × Wenhua Wei × Burak Karacaoeren × Susan Rowe * Jano van Hemert # John Allen # Computer scientist + Mathematician × Biologist
GridQTL Users • 443 Registered external users in 44 countries • 211 have used the core services, 67 the LDLA
Plan • GridQTL Team and users • Purpose the GridQTL project • Description of computing infrastructure and software behind the scene • Short demonstration of the Grided service
QTL mapping • Aim: To detect and locate genes (QTL) having an effect on a quantitative trait • Quantitative trait – a trait with a continuous measurement (size, weight, concentration) • QTL (quantitative trait locus) – a gene or DNA segment having an effect on a quantitative trait
Rationale for QTL analysis • To understand genetic variation by dissecting complex traits • fundamental knowledge of gene actions and interactions • applications in agriculture • applications in medicine
History: QTL Express • Web portal to map QTL in experimental populations • Based on Java servlets and uses a dedicated pool of 6 computers • 100+ Users • The increase of computational demand degraded the quality of service very significantly : 6 computers are not enough !
Most recent developments in genetics • New models are very computational (100s CPU hours per analysis) • Potential for models which can be applied on complex pedigrees (real life populations: ex LDLA) • Potential of more complex genetic models (multiple QTLs: ex epistasis) • Now feasible • 100,000s marker genotypes per individual • 10,000s phenotypes • 1000s individuals • Current approaches may be inadequate to analyse resulting large data sets • High Throughput analyses : 10,000s CPU hours per analysis
Plan • GridQTL Team and users • Introduction to the GridQTL project • Description of computing infrastructure • Short demonstration of the Grided service
Increasing the computational capacity available to GridQTL • 2006 : Condor pool of 10 computers • 2007 : NGS-1 (500 CPUs) • 2008 : ECDF+NGS-2 > 2600 CPUs
Server PETRA (condor) 10 CPUs LINUX(FC, RHAT, SUSE) SOLARIS IRIX Grid Infrastructure RAL 256 CPUs CARDIFF Local resources In Edinburgh MANCHESTER 256 CPUs ECDF 1456 CPUs National Grid Service LEEDS 256 CPUs WESTMINSTER 128 CPUs OXFORD 256 CPUs LANCASTER
Software description NGS/ECDF Server running GridSphere / Tomcat globus gsissh SWARM Meta-Scheduler JSR-168 Portlet in browser JSP HTML ssh AJAX : JavaScript / servlet Condor pool JavaScript
How do we use the NGS ? • Our users log on the website, are identified through their unique user name. • They run queries by clicking buttons on the web interface. • These buttons run some Java functions that call Globus toolkit routines • The NGS authorize these routines to run by recognising a NGS portal certificate which identifies the web server and its administrator • An accounting system has to be put in place by the administrator for usage audit.
Job submission • globus-job-submit -env JAVA_HOME=/usr/local/Cluster-Apps/java/jdk1.6.0_01 -env host_name=ngs.wmin.ac.uk ngs.wmin.ac.uk/jobmanager-pbs -stderr /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/.err.0000004696.4702.527.txt -stdout /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/.out.0000004696.4702.527.txt -maxtime 18 -np 2 -host-count 1 -dir /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere -x "&(jobtype=single)(minMemory=1100)" -l /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/LDLA_GridSphere.sh "4;4702;527;0000004696;achatzipli;0;server.cap.ed.ac.uk;qtlportlets/public/01573ec42120bf1301218d475e67021b/;"
Profiling the application • Profiling software : gprof • Your own script #/bin/csh ./myapplication.sh & ./profiler.csh $! & wait • Memory : ps -o vsize,comm,user • Can monitor also disk usage ( some NGS cluster have large temporary storage facilities ) • Computational load : uptime
Failures happen ! • Output failures : data corrupted during file transfer/on the nodes/out of memory etc… • Duration failures : jobs terminated because of their duration exceed the reserved duration • Submission failures : failure of the network during the jobs submission • Server failures : partly handled
Plan • GridQTL Team and users • Introduction to the GridQTL project • Description of computing infrastructure • Short demonstration of the Grided service
Linkage Disequilibrium Linkage Analysis • Good for complex pedigrees • For instance a population of feral sheep from St Kilda • Or others… even plants (diploids). • Basically good for populations which would be too expensive (or unethical) to breed for experimental purpose. • 3,447 analyses run (198,142 Jobs on the Grid) 11,005 Hours CPU time 900 Hours user time
Thank you ! • J. Hernández-Sánchez, J.A. Grunchec and S. Knott. A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics 25(11): 1377-1383 (2009). • J.A. Grunchec, J. Hernández-Sánchez and S. Knott. SWARM: A meta-scheduler to minimize job queuing duration in a Grid portal. Accepted by the International Conference of Cluster and Grid Computing Systems, Oslo, Norway, July 2009. • G. Seaton, J. Hernandez, J.A. Grunchec, I. White, J. Allen, D.J. DeKoning, Wenhua Wei, D. Berry, C. Haley, S. Knott. GridQTL: a Grid portal for QTL mapping of compute intensive datasets.8th World Congress on Genetics Applied to Livestock Production, August 13-18, 2006, Belo Horizonte, MG, Brasil • Portal http://cleopatra.cap.ed.ac.uk/gridsphere/gridsphere • http://gridqt1.cap.ed.ac.uk:8080/gridsphere/gridsphere • email : jgrunche@staffmail.ed.ac.uk