1 / 1

A computer (Windows, Linux and Mac are welcome).

Cadaques A cluster for our bioinformatic needs. Txema Heredia 1,2 , Ángel Carreño 1,2 , Carles Perarnau 3 , Carlos Morcillo-Suarez 1,2 , and Arcadi Navarro 1,2,4

Download Presentation

A computer (Windows, Linux and Mac are welcome).

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cadaques A cluster for our bioinformatic needs Txema Heredia1,2, Ángel Carreño1,2, Carles Perarnau3, Carlos Morcillo-Suarez1,2, and Arcadi Navarro1,2,4 1 IBE Institut de Biologia Evolutiva UPF-CSIC, Barcelona, Spain. 2 Instituto Nacional de Bioinformática, Spain. 3 Unitat de Suport Tecnològic a Projectes de Recerca, UPF, Barcelona, Spain. 4 Institució Catalana de Recerca i Estudis Avançats (ICREA). Catalonia, Spain. arcadi.navarro@upf.edu INTRODUCTION • Nowadays, the amount of biological data available has increased in such a way that the “friendly” analysis you were used to run have become several week-long monsters which no longer can be faced by your desktop or laptop computers. • It is expected that the number of analysis needed will increase, so we need a solution to be able to run them in a proper way. • That solution is CADAQUES, our cluster system. • It is widely used. Total time consumed so far: 540,349 hours, i.e. 61.7 computing years in only 2.3 real years!! TECHNICAL SPECIFICATIONS • 11x IBM XM 21 blades (same technology as Marenostrum) resulting in: • 16x Intel Xeon E5345 Quad-core @ 2.33GHz CPUs, which allow to run up to 64 single-core jobs • 192 Gb of memory (4 blades with 32 Gb and 4 blades with 16 Gb) • 30 Tb of disk • Sun Grid Engine queue system OPEN TO EVERYONE • The cluster is open to any member of the IBE. • If you are interested in using it, send me an email to txema.heredia@upf.edu, and I will create you an account. • The scientific director of the cluster is Arcadi Navarro, so pester him if you are in a hurry. • QUEUES SYSTEM • In order to submit a job to the cluster, instead of running it directly, you have to submit it to the queue system. This allows the cluster to distribute the jobs in a fair and efficient way. • Fairness. The queue system has a fairness feature that, instead of scheduling the jobs “first in first out”, it distributes it among all the users, preventing a single user to monopolize all the job slots. • High allocation system. The queues system tries to allocate the jobs according to the available cluster resources in a given moment. This allows little demanding jobs to slip through bigger ones, decreasing your waiting time and increasing effectively the cluster resource usage. SOFTWARE • WHAT DO I NEED? • A computer (Windows, Linux and Mac are welcome). • An ssh connection software (Putty for windows, or Linux & Mac • system’s built-in). • An internet connection. • Some Linux usage skills. Don’t panic! It’s easy. • Operative System: Linux CentOS 5.0 (RedHat) Rocks Cluster Distribution • The following bioinformatics software are currently installed: • Haplotype Estimation & Analysis • Fast Phase • Phase • Haploview • LDhat • Phylogenetics • MrBayes • Paml41 • Whole Genome Association Analysis • Plink • Population Genetics • Clumpp • Structure • Cosi • Simcoal2 • ihs • Sweep • Xpehh • Programming languages available in the cluster: • R • Perl • Bioperl • Python • C • Java • Php • MPI • Open MP • Sequence Analysis & Manipulation • Hmmer • Staden • TrimAl • GBlocks • Microbiology • Dotur • Mothur • S-libshuff • Sons • Treeclimber • Sequence Assembly • Caftools • Gap5 • Mira • Sequence Alignment • Blast • T-coffee Time gained by using the cluster • DATA • 30 Tb of disk storage. • Mysql server to store your databases. • Currently hosting a series of public databases: • HapMap • UCSC Genome Browser • Ensembl • Samba server, so you can access your data easily, and use it as a remote backup system. • Web repository. • … but anything can be installed under demand. Feel free to ask!

More Related