130 likes | 252 Views
Overview IST-2001-38344. Cells are a collection of protein nanomachines. A biological challenge. To build models of protein complexes & understand the function of each component, based upon available evidence.
E N D
Overview IST-2001-38344
A biological challenge • To build models of protein complexes & understand the function of each component, based upon available evidence. • However, to build evidence for each protein interaction, a biologist must find, integrate, compare & then validate the results from a number of separate resources.
DNA ‘chips’ Modelling HTP Sequencing SNP Gene prediction Proteomics Domain analysis Synchrotron Genomics & Proteomics Expression Folding PROTEIN STRUCTURES DNA
Interaction Space Expression Space Literature Space Genomics & Proteomics
The need for computerised information systems • New HTP methods produce orders of magnitude more data than before: • More than is interpretable manually. • Data are stored in a (semi-)structured format. • Much knowledge is in literature & patents: • 13,000,000 abstracts in MEDLINE. • Knowledge is stored in an unstructured format. • Solution: computerised information systems: • Enable data mining & visualisation of integrated resources, with text analysis.
Components of bioGrid • Gene expression: • ExpressionSpace: • Clustering of microarray data. • May require large memory. • Protein interaction: • PSIMAP: • Predict interactions between protein domains. • May pre-compute as relatively unchanging. • Literature: • GoPubMed-D: • Organises corpus of documents into the GO ontology. • Lexical analysis requires lengthy compute.
Expression Space: Space Explorer Interaction Space: PSIMAP LLNE YLEEVE EYEEDE LLNE YLEEVE EYEEDE LLNE YLEEVE EYEEDE Literature Space: Classification Server bioGrid: An integrated platform for gene expression data, protein interaction data, and literature
Workflow for use case - Part I • Search literature for papers about the experimental system studied: • Microarray & mitochondria. • Upload the gene expression data set. • Cluster the gene expression data set. • Identify a cluster that contains genes of interest, e.g. energy production. • Examine the expression profiles of the genes in the cluster.
Workflow for use case - Part II • Calculate an induced PSIMAP graph for the genes in the expression cluster. • Explore PSIMAP graph & nodes. • For pairs of genes predicted to interact: • Search literature for papers citing both genes. • Classify literature to assess possible function or metabolic processes of genes. • Assimilate evidence for components of a protein complex.
Distributed technology implementation • Globus, Unicore, Legion, … • Are geared towards computational complexity, not semantic complexity. • BioGrid’s approach: • Agent-based approach. • Integration of rules, reasoning, and messaging in a Java-environment. • Using meta-model. • Advantage: • Easy to maintain, easy to use, includes code distribution, architecture independent, geared towards farms of local and remote machines.
Prova-AA • Extensions to Prova for rule-based agent scripting. • Prova-AA introduces: • Messaging (local, JMS, and JADE). • Reaction rules. • Context-dependent inline reactions for asynchronous messaging. • Embedding of Prova agents in Java and Web app’s. • Advantages: • Cooperating agents vs. GRID RPC. • Ease of development and maintenance. • Platform independence and portability. • High level specification of communication protocols. • Native syntax integration with Java. • Low-cost creation of distributed workflows. And ad-hoc networks of computation nodes.