480 likes | 659 Views
Porting of Bio-Informatic Tools for Plant Virology on a Computational Grid. Gaetano Lanzalone 1,3 , Alessandro Lombardo 1,2 Annamaria Muoio 1 , Marcello Iacono-Manno 1 , Roberto Barbera 1,4 1 INFN Sezione di Catania and Consorzio COMETA Catania IT
E N D
Porting of Bio-Informatic Tools for Plant Virology on a Computational Grid Gaetano Lanzalone1,3, Alessandro Lombardo1,2 Annamaria Muoio1, Marcello Iacono-Manno1, Roberto Barbera1,4 1INFN Sezione di Catania and Consorzio COMETA Catania IT 2Dipartimento di Scienze e Tecnologie Fitosanitarie Catania IT 3INFN LNS Catania IT 4Dipartimento di fisica Università di Catania IT Catania, June.2008
Outline • 1 Introduction on the Biological problem. • 2 TriGrid and Cometa projects. • 3 Problem Solution by GENIUS. • 4 Results and Conclusions
Biological problem TESTBEDS CMV (Cucumber mosaic virus) TYLCV (Tomato yellow leaf curl virus) TYLCSV TSWV (Tomato yellow leaf curl sardinia virus) (Tomato spotted wilt virus) CTV (Citrus tristeza virus)
Symptoms: • Rapid decline and death of Citrusgrafted on bitter orange (Citrus aurantianum L.) • Stem pitting, yields reduced, poor quality of the fruits. • Yellow seedling and leaves. • Low growth rate Vectors: Aphids (Toxoptera citricida, Aphis gossypii)
CTV Geographic distribution: Algeria, American Samoa, Antigua and Barbuda, Argentina, Australia, Belize, Bermuda, Bolivia, Brazil, Brunei Darussalam, Cameroon, the Central African Republic, Chad, China, Colombia, Costa Rica, Cyprus, the Dominican Republic, Ecuador, Egypt, El Salvador, Ethiopia, Fiji, French Polynesia, Gabon, Ghana, Guyana, India, Indonesia, Iran, Israel, Italy, Jamaica, Japan, Kenya, Korea Republic, Malaysia, Mauritius, Morocco, Mozambique, Nepal, Netherlands Antilles, New Caledonia, New Zealand, Nicaragua, Nigeria, Pakistan, Panama, Paraguay, Peru, the Philippines, Portugal, Puerto Rico, Saudi Arabia, Spain, Sri Lanka, Suriname, Taiwan, Tanzania, Thailand, Trinidad and Tobago, Turkey, the USA, Uganda, Uruguay, Venezuela, Vietnam, Zaire, Zambia, Western Samoa, the former Yugoslavia, Zimbabwe.
CTV (Citrus Tristeza Virus) ZOOM Particles dimension: 2000nm x 11 nm Genome: RNA single strand + 19,3 Kb Genome organization: 12 Open Reading Frames + 2 Untraslated Terminal Regions Proteins produced: at least 19 Complete genomes in GenBank: 9
ZOOM NUCLEOTIDE
LOCATION OF THE RECOMBINATION EVENTS TOPALi TOPALi V2.0 (BioSS-Biomathematics & Statistics Scotland) DDS (Difference of Sums of Square - McGuire and Wright, 2000) PDM(Probabilistic Divergence Measures – Husmeier and Wright, 2001) HMM (Hidden Markov Model – Husmeier and McGuire, 2003) Time of analysis PDM about alignment of CTV on pc user (3.2 MHz)= 44,2 h !!!!!
The Sicilian Grid in one slide 1500+ CPUs 250+ TBytes ~15.000.000 € in 3 years! ~300 FTE’s ! (2/3 new hired staff)
Objectives of an e-Infrastructure in Sicily • Create a Virtual Laboratory in Sicily, both for scientific and industrial applications, built on the top of a Grid infrastructure • Connect the Sicilian e-Infrastructure to those already existing in Italy, Europe and the rest of the world improving the scientific collaboration and increasing the “competitiveness” of e-Science and e-Industry “made in Sicily” • Disseminate the “Grid paradigm” through the organization of dedicated events and training courses • Trigger/foster the creation of spin-offs in the ICT area in order to reduce the “brain drain” of brilliant young people to other parts of Italy and beyond
The TriGrid e-Infrastructure • 288 cores AMD Opteron 280 • 400 GB of memory • LSF 6.1 HPC everywhere • Infiniband-1X at INAF-OACT • and CECUM for HPC apps. • 57 TB of raw disk storage FC-2-SATA • Distributed/parallel GPFS filesystem
Lay-out of large sites Site 1 (expansion w.r.t. TriGrid) Site 3 Site 6 Padova, V Workshop INFN Grid, 18.12.2006 17
8 IBM BladeCenter H enclosures 84 IBM LS21 “blades” 336 cores AMD Opteron 2218 rev. F 772 GB of RAM (2 GB/core) 0.55 MSpecInt2000 0.66 MSpecFP2000 More than 6 kSpec(Int/FP)Rate ~ 48.8 mW/SpecInt2000 at full load ! G-Ethernet service network CISCO Topspin Infiniband-4X additional low-latency network for HPC applications LSF 6.1 HPC included ! Computing, Networking, and Storage (2/3)
4 IBM DS4200 Storage Systems (sites 1, 2, 3, and 6) FC-2-SATA technology 136 500-GB disks 68 TB of storage (raw) in total Expandability up to 0.45 PB GPFS distributed/parallel file sytem included ! Computing, Networking, and Storage (3/3)
Command line interface • Expert User long time before start … Webinterface • Dummy User immediate start … We need: • Friendly User Interface ENGINFRAME (GENIUS)
A web portal: why and how ? • It can be accessed from everywhere and by “everything” (desktop, laptop, PDA, WAP phone). • It can keep the same user interface to several back-ends (grid “dialects” command-line UI’s). • It must be “secure” at all levels: • 1) secure about web transactions, • 2) secure about user authentication, • 3) trustworthy at VO level. • All available Grid services must be incorporated in a logic way, just “one mouse click away”. • Its layout must be easily understandable and user friendly.
EnginFrame in brief • Standard based GRID portal • Java, Tomcat, XML/XSL GridML • Solves back-end integration problems • Visual rendering for most Grid objects • jobs, job arrays, hosts, etc. • Multiple Grid technologies support • Globus, LSF, SGE, LoadLeveler*, PBS*, even OS! • Authentication delegation • Data management, UL/DL + remote file browsing • Integration with interactive applications, tools,…
Browsing request Service request EnginFrame Agents HTML Rendering XML Output EnginFrame workflow Application Servers Interactive applications Clients Web Server EnginFrame Server Standard Web Browser Grid / Compute Farm
Porting of Bio-Informatic Tools for Plant Virology Applications: ClustalW, TOPALi, SplitsTree and Knetfold. • ClustalW is an execution MPI job on the Grid of data analysis program for multiple alignments • TOPALi and Splitstree programs run as interactive jobs on the Grid • Knetfold application runs as a parametric job.
JDL XML DEVELOPER USER
TOPALi input
TOPALi output
SplitsTree input.aln output.jpg
SplitsTree algorithm bootstrapping
Sequences alignement time by ClustalW Elapsed time for the ClustalW-MPI results of 9 Citrus Tristeza Virus complete genome as a function of the number of processor.
Comparison in time for TOPALi Comparison of TOPALi2 analysis times on CTV sequences, DSS method, carried out on different computational architectures.
Summary and Conclusions • TriGrid VL and PI2S2 are the first Grid projects in Italy at a true regional level. After one year from the beginning the e-Infrastructure of TriGrid is now a reality and a big portfolio of application is about to be deployed on it. The PI2S2 Infrastructure is also available since six months ago. • The process speed-up together with the integration of the whole phylo-genetic analysis into a coherent and easy-to-use frame, will lead to a remarkable progress in such investigations.
(The father of “ubiquitous computing”) A citation … Telephone, Light bulb, Telegraph, Radio, TV, Computer, Network, PC, Web, … (in the same order as they were invented)
You can this way copy files from or to a remote server, you can even copy files from one remote server to another remote server, without passing through your PC. • Usage • scp [[user@]from-host:]source-file [[user@]to-host:][destination-file] • Description of options • from-host • Is the name or IP of the host where the source file is, this can be omitted if the from-host is the host where you are actually issuing the command • user • Is the user which have the right to access the file and directory that is supposed to be copied in the cas of the from-host and the user who has the rights to write in the to-host • source-file • Is the file or files that are going to be copied to the destination host, it can be a directory but in that case you need to specify the -r option to copy the contents of the directory • destination-file • Is the name that the copied file is going to take in the to-host, if none is given all copied files are going to maintain its names • scp *.txt user@remote.server.com:/home/user/ • This will copy all files with .txt extension to the directory /home/user in the remote.server.com host
Any Questions ? Thank you very much for your kind attention! This work makes use of results produced by the PI2S2 Project managed by the Consorzio COMETA, a project co-funded by the Italian Ministry of University and Research (MIUR) within the Piano Operativo Nazionale “Ricerca Scientifica, Sviluppo Tecnologico, Alta Formazione” (PON 2000-2006). More information is available at http://www.pi2s2.it and http://www.consorzio-cometa.it