1 / 20

GridQTL : Using the NGS to map genes through a web portal

GridQTL : Using the NGS to map genes through a web portal. Jean-Alain Grunchec University of Edinburgh. Plan. GridQTL Team and users Introduction to the GridQTL project Description of computing infrastructure and software behind the scene Short demonstration of the Grided service.

helene
Download Presentation

GridQTL : Using the NGS to map genes through a web portal

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GridQTL : Using the NGS to map genes through a web portal Jean-Alain Grunchec University of Edinburgh

  2. Plan • GridQTL Team and users • Introduction to the GridQTL project • Description of computing infrastructure and software behind the scene • Short demonstration of the Grided service

  3. GridQTL Team * Sara Knott +Ian White × Jules Hernandez-Sanchez #Jean-Alain Grunchec # Kashif Saleem * Chris Haley * Dirk-Jan de Koning × Wenhua Wei × Burak Karacaoeren × Susan Rowe * Jano van Hemert # John Allen # Computer scientist + Mathematician × Biologist

  4. GridQTL Users • 443 Registered external users in 44 countries • 211 have used the core services, 67 the LDLA

  5. Plan • GridQTL Team and users • Purpose the GridQTL project • Description of computing infrastructure and software behind the scene • Short demonstration of the Grided service

  6. QTL mapping • Aim: To detect and locate genes (QTL) having an effect on a quantitative trait • Quantitative trait – a trait with a continuous measurement (size, weight, concentration) • QTL (quantitative trait locus) – a gene or DNA segment having an effect on a quantitative trait

  7. Rationale for QTL analysis • To understand genetic variation by dissecting complex traits • fundamental knowledge of gene actions and interactions • applications in agriculture • applications in medicine

  8. History: QTL Express • Web portal to map QTL in experimental populations • Based on Java servlets and uses a dedicated pool of 6 computers • 100+ Users • The increase of computational demand degraded the quality of service very significantly : 6 computers are not enough !

  9. Most recent developments in genetics • New models are very computational (100s CPU hours per analysis) • Potential for models which can be applied on complex pedigrees (real life populations: ex LDLA) • Potential of more complex genetic models (multiple QTLs: ex epistasis) • Now feasible • 100,000s marker genotypes per individual • 10,000s phenotypes • 1000s individuals • Current approaches may be inadequate to analyse resulting large data sets • High Throughput analyses : 10,000s CPU hours per analysis

  10. Plan • GridQTL Team and users • Introduction to the GridQTL project • Description of computing infrastructure • Short demonstration of the Grided service

  11. Increasing the computational capacity available to GridQTL • 2006 : Condor pool of 10 computers • 2007 : NGS-1 (500 CPUs) • 2008 : ECDF+NGS-2 > 2600 CPUs

  12. Server PETRA (condor) 10 CPUs LINUX(FC, RHAT, SUSE) SOLARIS IRIX Grid Infrastructure RAL 256 CPUs CARDIFF Local resources In Edinburgh MANCHESTER 256 CPUs ECDF 1456 CPUs National Grid Service LEEDS 256 CPUs WESTMINSTER 128 CPUs OXFORD 256 CPUs LANCASTER

  13. Software description NGS/ECDF Server running GridSphere / Tomcat globus gsissh SWARM Meta-Scheduler JSR-168 Portlet in browser JSP HTML ssh AJAX : JavaScript / servlet Condor pool JavaScript

  14. How do we use the NGS ? • Our users log on the website, are identified through their unique user name. • They run queries by clicking buttons on the web interface. • These buttons run some Java functions that call Globus toolkit routines • The NGS authorize these routines to run by recognising a NGS portal certificate which identifies the web server and its administrator • An accounting system has to be put in place by the administrator for usage audit.

  15. Job submission • globus-job-submit -env JAVA_HOME=/usr/local/Cluster-Apps/java/jdk1.6.0_01 -env host_name=ngs.wmin.ac.uk ngs.wmin.ac.uk/jobmanager-pbs -stderr /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/.err.0000004696.4702.527.txt -stdout /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/.out.0000004696.4702.527.txt -maxtime 18 -np 2 -host-count 1 -dir /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere -x "&(jobtype=single)(minMemory=1100)" -l /home/ngs/ngs0739/production_new_modules/LDLA_GridSphere/LDLA_GridSphere.sh "4;4702;527;0000004696;achatzipli;0;server.cap.ed.ac.uk;qtlportlets/public/01573ec42120bf1301218d475e67021b/;"

  16. Profiling the application • Profiling software : gprof • Your own script #/bin/csh ./myapplication.sh & ./profiler.csh $! & wait • Memory : ps -o vsize,comm,user • Can monitor also disk usage ( some NGS cluster have large temporary storage facilities ) • Computational load : uptime

  17. Failures happen ! • Output failures : data corrupted during file transfer/on the nodes/out of memory etc… • Duration failures : jobs terminated because of their duration exceed the reserved duration • Submission failures : failure of the network during the jobs submission • Server failures : partly handled

  18. Plan • GridQTL Team and users • Introduction to the GridQTL project • Description of computing infrastructure • Short demonstration of the Grided service

  19. Linkage Disequilibrium Linkage Analysis • Good for complex pedigrees • For instance a population of feral sheep from St Kilda • Or others… even plants (diploids). • Basically good for populations which would be too expensive (or unethical) to breed for experimental purpose. • 3,447 analyses run (198,142 Jobs on the Grid) 11,005 Hours CPU time 900 Hours user time

  20. Thank you ! • J. Hernández-Sánchez, J.A. Grunchec and S. Knott. A web application to perform linkage disequilibrium and linkage analyses on a computational grid. Bioinformatics 25(11): 1377-1383 (2009). • J.A. Grunchec, J. Hernández-Sánchez and S. Knott. SWARM: A meta-scheduler to minimize job queuing duration in a Grid portal. Accepted by the International Conference of Cluster and Grid Computing Systems, Oslo, Norway, July 2009. • G. Seaton, J. Hernandez, J.A. Grunchec, I. White, J. Allen, D.J. DeKoning, Wenhua Wei, D. Berry, C. Haley, S. Knott. GridQTL: a Grid portal for QTL mapping of compute intensive datasets.8th World Congress on Genetics Applied to Livestock Production, August 13-18, 2006, Belo Horizonte, MG, Brasil • Portal http://cleopatra.cap.ed.ac.uk/gridsphere/gridsphere • http://gridqt1.cap.ed.ac.uk:8080/gridsphere/gridsphere • email : jgrunche@staffmail.ed.ac.uk

More Related