340 likes | 458 Views
FunCat TM , a controlled vocabulary encompassing the biology of prokaryotes, plants and animals from cellular to systemic level. Dr. Dieter Maier Manchester Ontologies Workshop 23/24.3.02 Biomax Informatics AG, Lochhamer Str. 11, 82152 Martinsried, Germany. Outline. Objectives Structure
E N D
FunCatTM, a controlledvocabulary encompassing the biology of prokaryotes, plants and animals from cellular to systemic level • Dr. Dieter Maier • Manchester Ontologies Workshop 23/24.3.02 • Biomax Informatics AG, Lochhamer Str. 11, 82152 Martinsried, Germany
Outline • Objectives • Structure • Content • Development • Use
Objectives • Automatic data management • No prior knowledge of vocabulary required • Group genes by functional categories • Extensible • Organism independent • Compatible to other ontologies
Disclaimer • what the FunCat is not: • Tool for the complete description of functions on a single gene level
Structure • Organized hierarchicall • Related functions grouped on different levels • Internally consistent • =>Provides a data warehouse • - overview about available selection • - progress from general to specific • - infere from specific to general
Transcription rRNA-transcription mRNA-transcription mRNA-processing Hierarchical structure • 5´-end processing rRNA-processing tRNA-transcription
Content • Covers cellular processes, systemic physiology, development and anatomy • from procaryotes to the human • 25 main Categories with ~ 1500 • sub-categories • Categories are independent of organism • Genes can belong to multiple categories
Localisation: 256 Subcellular localisation: 63 Cell type localisation: 69 Tissue localisation: 41 Organ localisation: 91 Molecular function: 122 Enzymatic activity => EC ~ 4400 Protein activity regulation: 23 Protein with binding function / cofactor requirement: 49 Transport facilitation: 49 Biological process: 1061 • Metabolism: 247 • Energy: 60 • Cell cycle and DNA processing: 54 • Transcription: 31 • Protein synthesis (Translation): 11 • Protein fate (folding, modification, destination): 25 • Cellular transport: 32 • Cellular communication: 47 • Cell rescue, defense and virulence: 50 • Regulation / interaction with cellular environment: 45 • Cell fate: 54 • Systemic regulation / interaction with environment : 89 • Development (systemic): 51 • Transposable Elements, viral and plasmid proteins: 8 • Control of cellular organisation: 57 • Cell type differentiation: 69 • Tissue differentiation: 40 • Organ differentiation: 91
Development • Historical • Pathways • Thesaurus • Complex relations
Structural development • Proven flexibility – easy to extend • Stable overall structure • Compatibel to other ontologies like • Enzyme Cataloge • Gene Ontology • EcoCyce
Development in numbers S. cerevisiae 1996 Main categories: 16 Depth: 4 Total: 182 Plant (A. thaliana) and Procaryotes 1998 20 6 528 Animals (Human) 2001 25 6 1448
Integrating Pathways into processes • hierachical structure allows: • Univocal attribution • Test for completeness • Test for consistence
Integrating additional information • Create a dynamic ontology from existing ontologies, • keywords and linguistic extraction of descriptors from • the literature • Semiautomatic mapping of dynamic ontologie to FunCat
Enabling complex relations • Intensify multidimensionality • Enable if ... then ... relations
Use • Manual annotation • Automatic annotation • Data mining
Four dimensions Manual annotation • multidimensional • stepwise
Manual annotation • 17 manually annotated genomes (5 eucaryotes, 12 • procaryotes) • H.sapiens, A.thaliana, S.cerevisiae, N.crassa, • propriatary: A.niger • B.subtilis, T.acidophilum, Listeria, 6 public procaryotes • in progress, • propriatary: C.glutamicum, C.pneumoniae, 1 undisclosed • Used for annotation of Transcriptomes
Automatic Annotation • Sequence similarity to manually annotated proteins • (distinguish experimentally verified and similarity • associated function): • H. sapiens • A. thaliana • S. cerevisiae • B. subtilis • T. acidophilum
Bacteria Archea Eucarya Green non-sulfur bacteria Slime molds Animals Entamoeba Fungi Methanosarcina Extreme halophiles Gram positives Methanobacterium Plants Methanococcus Ciliates Proteobacteria Thermoproteus Flagellates Pyrodictium Cyanobacteria Flavobacteria Trichomonades Thermotogales Microsporida Diplomonades PEDANT Genome Database Currently more than 170 genomes (600 000 ORFs)
Data mining • Retrieval • Visualisation • Mining • Integration
Queries using the FunCat: Grouplevel • Looking for groups of genes:
Single molecule level • Retrieving protein entries:
The human FunCat Translation cell cycle Transcription Protein fate Energy Intracellular Transport Metabolism Signalling Unclassified Defense Cell physiology
Comparing genomes Sequence similairty „ functional homology“ Identification of organism specific functions
Comparing H.sapiens – B.subtilis Protein fate Cellular communication Interaction with cellular environment Metabolism
Integrative analysis Protein expression data Protein-proteininteraction data Gene expression data Functional catalogue Functional catalogue Functional catalogue Functional catalogue
Limitations • Co-expression is no proof of functional association. • Integrate evidence from multiple sources.
Integration with annotation • Analyse gene expression datausing integration with annotation catalogues. • Functional catalogue • Phenotypes • Interaction
FunCat • Tool to structure information • Tool to connect information
Thank you!