Overview IST-2001-38344

Overview IST-2001-38344

Cells are a collection of protein nanomachines

A biological challenge • To build models of protein complexes & understand the function of each component, based upon available evidence. • However, to build evidence for each protein interaction, a biologist must find, integrate, compare & then validate the results from a number of separate resources.

DNA ‘chips’ Modelling HTP Sequencing SNP Gene prediction Proteomics Domain analysis Synchrotron Genomics & Proteomics Expression Folding PROTEIN STRUCTURES DNA

Interaction Space Expression Space Literature Space Genomics & Proteomics

The need for computerised information systems • New HTP methods produce orders of magnitude more data than before: • More than is interpretable manually. • Data are stored in a (semi-)structured format. • Much knowledge is in literature & patents: • 13,000,000 abstracts in MEDLINE. • Knowledge is stored in an unstructured format. • Solution: computerised information systems: • Enable data mining & visualisation of integrated resources, with text analysis.

Components of bioGrid • Gene expression: • ExpressionSpace: • Clustering of microarray data. • May require large memory. • Protein interaction: • PSIMAP: • Predict interactions between protein domains. • May pre-compute as relatively unchanging. • Literature: • GoPubMed-D: • Organises corpus of documents into the GO ontology. • Lexical analysis requires lengthy compute.

Expression Space: Space Explorer Interaction Space: PSIMAP LLNE YLEEVE EYEEDE LLNE YLEEVE EYEEDE LLNE YLEEVE EYEEDE Literature Space: Classification Server bioGrid: An integrated platform for gene expression data, protein interaction data, and literature

Workflow for use case - Part I • Search literature for papers about the experimental system studied: • Microarray & mitochondria. • Upload the gene expression data set. • Cluster the gene expression data set. • Identify a cluster that contains genes of interest, e.g. energy production. • Examine the expression profiles of the genes in the cluster.

Workflow for use case - Part II • Calculate an induced PSIMAP graph for the genes in the expression cluster. • Explore PSIMAP graph & nodes. • For pairs of genes predicted to interact: • Search literature for papers citing both genes. • Classify literature to assess possible function or metabolic processes of genes. • Assimilate evidence for components of a protein complex.

Distributed technology implementation • Globus, Unicore, Legion, … • Are geared towards computational complexity, not semantic complexity. • BioGrid’s approach: • Agent-based approach. • Integration of rules, reasoning, and messaging in a Java-environment. • Using meta-model. • Advantage: • Easy to maintain, easy to use, includes code distribution, architecture independent, geared towards farms of local and remote machines.

Prova-AA • Extensions to Prova for rule-based agent scripting. • Prova-AA introduces: • Messaging (local, JMS, and JADE). • Reaction rules. • Context-dependent inline reactions for asynchronous messaging. • Embedding of Prova agents in Java and Web app’s. • Advantages: • Cooperating agents vs. GRID RPC. • Ease of development and maintenance. • Platform independence and portability. • High level specification of communication protocols. • Native syntax integration with Java. • Low-cost creation of distributed workflows. And ad-hoc networks of computation nodes.

Proposed Architecture of integrated platform

Overview IST-2001-38344

Overview IST-2001-38344

Presentation Transcript

U.S. TOURISM AN OVERVIEW 2001-2002

Information Society Technologies Programme Accompanying Measures (IST-2001-32633)

An overview of September 11, 2001

ADAPT IST-2001-37173

IST-2001-34825

A European Union Thematic Networks Project IST-2001-39122

IST Work Programme 2001 RATP contribution

Esperonto Services IST-2001-34373

IST Programme project IST- 2001- 35188 CELEBRATE Context e-Learning with Broadband Technologies

UIUC Fa 2001 Accy403 (MSBA) Overview

UIUC Fall 2001 Accy403 (MSA) Overview

NEUWEB NEUral network engineered WEB portal IST-2001-34387

CRESCCO Project IST-2001-33135

MGT 1102 Fall 2001 Course Overview

CRESCCO Project IST-2001-33135

The GFSM 2001 System - An overview

IST-2001-34825

OVERVIEW OF TISP WORK: 2001 - 2002

GRACE Project IST-2001-38100

WSCC Coal Overview September 25, 2001

CRESCCO Project IST-2001-33135

REGNET 2001-03 IST Cultural Heritage in Regional Networks