330 likes | 464 Views
Public Bioinformatics Services from the EBI. Rodrigo Lopez. Main Priorities. Providing access to comprehensive information resources in bioinformatics. Database searching Homology searching Sequence analysis 3D Structural analysis MicroArrays High availability. Goals.
E N D
Public Bioinformatics Services from the EBI Rodrigo Lopez
Main Priorities • Providing access to comprehensive information resources in bioinformatics. • Database searching • Homology searching • Sequence analysis • 3D Structural analysis • MicroArrays • High availability
Goals • Provide support (support@ebi.ac.uk) and assist with training (2can project). • Integrate the activities of various groups at the EBI. • In particular at the level of the web services. • Infrastructure planning and sharing of all External Services hardware and human resources.
Data Resources DNAsequences Ontologies Proteinsequences Pat.Abst(*). Genomes Functional patterns Literature Proteomes GeneExpression Metabolicpathways Proteinstructures (*) SOON
Homology Searches • Three classical applications: • Fasta, Wu-blast 2.0 and NCBI Blast 2.x • Advanced and very sensitive protein S&W homology searches available with: • MPsrch and Scanps • Identification of protein function: • InterProScan, FingerPrintScan, ppsearch... • 3D structure comparison using DALI/SSM/PQS/Ligand, etc.
WU-Blast2 • New Noteworthy improvement: • SENSITIVITY Control • Higher sensitivity means slower runs. • SWALL:1 sec (900K+ sequences). • EMBL:20 secs.(taxdiv)
Genomes & Proteomes • Similarity and homology searches are available using Fasta (ca. 100+ Archea, bacterial and eukaryotic genomes & proteomes to date). • Specialised Blast servers for Parasite genomes and vector screening (EVEC). • Fasta server for SNP scanning (HGVBASE) • Ensembl blast servers for metazoan genomes and proteomes.
Database Searching • Main service is based on SRS 6.x. • 150+ public databases are available. • More than 50 million records are searchable. • Bi-directional and multi-step links allow queries at the following levels: • A > B, A > B > C, A > C, etc…
Analysis Tools • Range from single nucleotide and protein sequence tools to MSA and phylogenetic tools (ClustalW and AMAS). • Gene prediction (GeneMark) • Function/Pattern identification (InterProScan, CpG anlysis, Radar, PRATT...) • Large scale analysis of genomes (GeneQuiz) • Large scale analysis of proteomes (HPI, PA) • Bioinformatic application workbenches: w2h and AppLab for GCG and EMBOSS • http://www.ebi.ac.uk/Tools/
Improvements: • Faster • Larger sets • Better tree views ClustalW
2D & 3D Structural Analysis • Comparison of protein structures in 2D (DALI/SSM) • Fold classification (DSSP, HSSP, FSSP) • Quaternary structure comparisons (PQS) • 3D sequence alignment services (3Dseq)
MicroArrays • ArrayExpress • MiameExpress • Expression Profiler
EBI services targets • Provide access to data and tools that can: • Describe disease ethiology. • Risk assessment. • Identification of drug targets. • Disease prevention. • Molecular definition of disease.
Impact on human medicine • Cystic fibrosis. • Huntingtons disease. • Myotonic dystrophy. • Cancer. • Alzheimer’s. • Malaria.
EBI services in biology • Describe populations and interactions. • Molecular Ecology. • Genetic variation. • Population management. • Genomics and Biodiversity.
High Availability • Jobs run in a highly heterogeneous computer environment: • Large SMP servers (SGI and HP/Compaq) • Linux based IBM PC farms • ca. 260 CPU’s (minimum dual CPU host) • Effective load sharing and balancing using Platform’s LSF. • Updating and rollover is achieved using ultra-efficient in-house developed tools.
StaticContent LSF Request Brokers BigIP CGIServletsJSPWS (SOAP)
ESEBI (spanish for ‘it’s EBI’) • Is in effect a transparent GRID for providing free computational services to the community. • Highly adaptable and exportable. • High degree of maintainability and reliability (24:7 solution).
…allows fast job requeueing… …and rescue…
…permits job relocation… …and job resource (re)allocation…
Typical ES EBI LSF host characteristics • ‘See all/Share ALL FS’: • SAN and NFS Caches. • Local storage (ca. 100Gb/host) for outage resolution. • Hosts can be removed/added to a farm for maintenance without affecting the service. • Hosts can be added/removed from LSF queues as demand rises/falls.
EBI Network news • Network upgrade • to 1Gb/sec (Hinxton - Cambridge) • from two redundant 34Mb/sec (Hinxton - London) • Acquisition of more SMP servers and expansion of the current production and External Services HP/Compaq cluster as well as the IBM PC farms.
ESEBI utilisation • 4 million hits/pages a month (excludes images) • More than 50K request on the SRS servers per day. • Close to 1 million job requests per month. • Current traffic averages 5Mb/sec.
New TechnologiesESEBI future • Web Services using SOAP/CORBA/RMI/LDAP • SOAPLab,WSE,WSRS,etc. • Provide programmatic access to biological data. • Provide programmatic access to bioinformatic applications as well.
3 Web designers: Stephen Robinson Asif Kibria Gulam Patel 4 Application developers: Ville Silventoinen Sharmila Pillai Emmanuel Quevillon Adam Lowe. 3 Support (HelpDesk): Karen Duggan Rob Harper Tamara Kulikova SRS Managers: Nicola Harte (SRS) Group Leader: Rodrigo Lopez The group