150 likes | 271 Views
Bioinformatics Data and the Grid: The GeneGrid Data Manager. Noel Kelly. GeneGrid Architecture. GeneGrid Workflow Status. GeneGrid Workflow Definition. GDM Service. GeneGrid Environment. GeneGrid Portal. GeneGrid Workflow Manager Service. GeneGrid Data Manager Registry. GeneGrid
E N D
Bioinformatics Data and the Grid:The GeneGrid Data Manager Noel Kelly
GeneGrid Architecture GeneGrid Workflow Status GeneGrid Workflow Definition GDM Service GeneGrid Environment GeneGrid Portal GeneGrid Workflow Manager Service GeneGrid Data Manager Registry GeneGrid Application Management Registry GDM Service GeneGrid Process Manager Service GeneGrid Input &Results Parameters GDM Service BeSC GAM Service GAM Service GDM Service iGAP GAM Service GDM Service Blast EMBL DB TMHMM mpiBlast SwissProt DB EBI SignalP SDSC SwissProt Database EMBL Database
GeneGrid Data Manager Objectives • Integrate specialised public biological data into the Grid • Integrate proprietary data into the Grid • Access and Storage of User Input Parameters • Experiment Tracking • Access and Storage of Experiment Results
GeneGrid Databases • GeneGrid Workflow Definition Database • Xindice 1.0 Collection • GeneGrid Workflow Status Database • Xindice 1.0 Collection • GeneGrid Results & Input Parameter Database • File System
Biological Databases • Structured File • Structured File • Structured File • Structured File • Structured File • Structured File • MySQL • Oracle • T.B.C. • EMBL Bank • SwissProt • TrEMBL • TrEMBL_new • GenBank • DDJB • ENSEMBL • Fusion Proprietary • Amtec Proprietary
Public Biological Data Integration SwissProt GeneGrid Data Manager Service Using BioPERL modules PERL Scripts JDBC Driver
Public Biological Data Integration SwissProt GeneGrid Data Manager Service BeSC Perl Script Record EBI
Fusion Antibodies Commercial Use Case Fasta File BlastP MQNSHSGVNQLGGVFVNGRPLPDSTRQKIVELAHSGARPCDISRILQVSNGCVSKILGRY………… Blast Format Blast Format SwissProt Query Blast Formatter Accession Numbers Multiple Fasta Records TMHMM Multiple TMHMM Format Multiple TMHMM Format SignalP Eliminator Fasta Records Multiple SignalP Format Multiple SignalP Format Bl2Seq Eliminator Fasta Records
Fusion Use Case – GDM Perspective BlastP SwissProt Query Blast Formatter TMHMM SignalP Eliminator Bl2Seq Eliminator
Querying SwissProt Accession Numbers Task Params Fasta Record GeneGrid Data Manager Service (for SwissProt) GeneGrid Data Manager Service (for GRIP) Multiple Accession Numbers SwissProt GRIP
Fusion Use Case – GeneGrid Perspective BlastP SwissProt Query Blast Formatter TMHMM SignalP Eliminator Bl2Seq Eliminator
Executing Bioinformatics Applications Result File Task Params GeneGrid Application Manager Service (for SignalP) GeneGrid Data Manager Service (for GRIP) Multiple Fasta Records GRIP Input File
GeneGrid Landmarks • 1 year through a 2 year project • Successfully integrated a number of bioinformatics applications • Successfully integrated a number of bioinformatics data sets • Number of papers accepted at various conferences (Computing & Bioinformatics) • International collaboration with EOL project (SDSC)
GeneGrid at All Hands • A practical Workflow Implementation for a Grid Based Virtual Bioinformatics Laboratory • Session 4.4, Thur 2nd Sep, 14:10 -15:50 • Bioinformatics Application Integration and Management in GeneGrid: Experiments and Experiences • Session 6.4, Fri 3rd Sep, 11:05 – 13:10
GeneGrid Demonstrations • Tuesday, 1st September • 18:15 – 20:15 • Thursday, 2nd September • 10:00 – 11:30 • 17:30 – 19:30 • Friday, 3rd September • 13:00 – 14:30