260 likes | 649 Views
BioGrid. LLNE YLEEVE EYEEDE. Drowning in information. Biology has changed dramatically from an information-light to an information-intensive area Much publicised Human Genome Project is only tip of the iceberg >500 tools online >8000 new abstracts per month. BioGrid. ??????? ??????.
E N D
LLNE YLEEVE EYEEDE Drowning in information... • Biology has changed dramatically from an information-light to an information-intensive area • Much publicised Human Genome Project is only tip of the iceberg • >500 tools online • >8000 new abstracts per month
BioGrid ??????? ?????? • Provide access to multiple, heterogeneous and geographically distributed information sources. • perform active searches for relevant information in non-local domain (includes retrieving, analysing, manipulating, and integrating information) Heureka!
BioGrid Objectives • Objectives: • Information and knowledge grid allowing knowledge • discovery and access to multiple types of structured and unstructured data, including gene expression and protein interaction data • Business objectives: • Grid for next generation classification research infrastructure for large proteomics and genomics databases; • Efficient transactional enterprise collaboration; • Faster time to market biotech innovation
Example A scientist is interested in a gene,e.g. NOX4 • Search PubMed for articles • Too many hits • Gene also known under different name • Analyse gene expression data • Which genes behave similar to NOX4 • Function of NOX4? • Analyse protein interactions • Which interactions and processes does expression of NOX4 trigger?
Challenges • Semantic Complexity • Computer does not “understand” data • DBs and systems cannot inter-operate • Computational complexity • generating protein interaction map takes ca. 7 days • analysing large sets of gene expression data can take up to an hour • analysis of large text bodies complex
BioGrid Expressiondata Character- isation of target sequence Sequences Metabolic pathway data Interactiondata Scientific literature BioGrid Vision
SpaceExplorer BioGrid Client BioGrid Server PSIMAP LiteratureClassification Server BioGrid Client BioGrid Client The Grid Approach • Semantic Web • global and local ontologies to capture meta-data and facilitate semantic inter-operability • Grid technology • transparent access to distributed resources • Agent technology • personal information agent collecting and presenting relevant information on behalf of its user
BioGrid Expressiondata Character- isation of target sequence Sequences Metabolic pathway data Interactiondata Scientific literature Classification server • Finding and processing relevant scientific literature
Results of PubMed Title • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... Author Year Journal However, to a machine things look different!
Results of PubMed • Lorenz P, Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation.Biol Chem. 2001 Apr;382(4):637-44. • Fredericks WJ. An engineered PAX3-KRAB transcriptional repressor inhibits the malignant phenotype of alveolar rhabdomyosarcoma cells harboring the endogenous PAX3-FKHR oncogene.Mol Cell Biol. 2000 Jul;20(14):5019-31.... Solution: tag data (XML)
Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem</journal><year>2001<year> • ... However, to a machine things look different!
Results of PubMed • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • <author>Lorenz P</author><title>Transcriptional repression mediated by the KRAB domain of the human C2H2 zinc finger protein Kox1/ZNF10 does not require histone deacetylation. </title><journal>Biol Chem </journal><year>2001<year> • ... Solution: use ontologies (Semantic Web)
Semantic Web • DAML+OIL is XML-based language to specify ontologies • Annotations of data refer to global ontology (where appropriate), hence joint understanding of data possible • Ongoing efforts in bioinformatics: e.g. gene ontology
Classification Server • Scientific objectives: • Effective concept recognition • Pattern matching • Intelligent data sourcing agents and tagging technology • Automated categorisation in a biotechnology-domain • Metadata hierarchy • Functional interoperability methodology design • Domain knowledge mapping, • Implementing a logical domain ontology • Integration of agent & classification logic & • visualisation technology.
BioGrid Expressiondata Character- isation of target sequence Sequences Metabolic pathway data Interactiondata Scientific literature Space Explorer • … is a general purpose visualisation tool facilitating interactive exploration of large data sets • … deals with multi-variate and proximity data • … provides • principal component analysis • multi-dimensional scaling (principal co-ordinate analysis, spring embedding) • clustering • … provides • dendrograms • 2D and 3D (using VRML) scatter plots • graphs and colour maps
Based on 3D structure, PSIMAP determines interactions of proteins Structure of map of great importance for understanding of biological processes Generation and analysis of the map are computationally expensive BioGrid Expressiondata Character- isation of target sequence Sequences Metabolic pathway data Interactiondata Scientific literature Protein Interaction: PSIMAP
No. Organisation (abbreviation) Country RTD role in the project 1 University of Groningen (RUG) NL User, Bioinformatics on drug discovery 2 ZooRobotics (ZRO) NL Co-ordinator, Supplier of GRID Classification Server, Exploitation Mng. 3 City University London (CIT) UK Supplier of intelligent agents and Space Explorer 4 University of Cyprus (UCY) EL Supplier of GRID knowledge engineering 5 Medical Research Centre (MRC) UK Supplier of PSIMAP, User, bio informatics on Food and Nutrition Partners
Workpackage title WP0 Management WP1 Source domain analysis WP2 Hierarchy creation, Metadata model development WP3 Classification logic integration WP4 Agent implementation WP5 Visualisation implementation WP6 Measurement and evaluation WP7 Dissemination and exploitation Work packages
BioGrid Mission: Distributed computational biology platform for fast pharmaceutical research BioGrid Expression Space: Space Explorer Interaction Space: PSIMAP Pathway Space: Literature Space: Classification Server