1 / 20

Making Sense of Life Sciences Data

Making Sense of Life Sciences Data. Nigel Martin. 21 st May 2008. Life Sciences Informatics. The development and use of computational methods for the acquisition management analysis and interpretation

Download Presentation

Making Sense of Life Sciences Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Sense of Life Sciences Data Nigel Martin 21st May 2008

  2. Life Sciences Informatics • The development and use of computational methods for the • acquisition • management • analysis and • interpretation • of biological and medical information to determine biological functions and mechanisms as well as their applications in user communities • This biological and medical information is encoded in the vast amounts of data now generated in the life sciences e.g. dna data

  3. Life Sciences Informatics … C A G C C T

  4. Life Sciences Informatics … C A G C C T Homo sapiens

  5. Gene expression Biological Processes Protein RNA Genome (made of DNA) FUNCTION A gene Temporary copy Permanent copy Job Product

  6. Life Sciences Data is Complex • The primary data of DNA and protein sequences are held in large repositories such as the EMBL Nucleotide Sequence Database • The latest release contains 114,475,051 sequences comprising 215,540,553,360 nucleotides • But life sciences data comprises of much besides sequence data…

  7. Life Sciences Data is Complex • e.g. CATH protein structure classification

  8. Life Sciences Data is Complex • e.g. herpesvirus evolutionary tree

  9. Life Sciences Data is Complex • e.g. Kegg metabolic pathway

  10. Life Sciences Data is Complex • e.g. PubMed medical abstract • Toxicol Appl Pharmacol. 2004 Dec 1;201(2):178-85. • Related Articles, Links • cDNA microarray analysis of rat alveolar epithelial cells following exposure to organic extract of diesel exhaust particles.Koike E, Hirano S, Furuyama A, Kobayashi T.Particulate Matter (PM2.5) and Diesel Exhaust Particles (DEP) Research Project, National Institute for Environmental Studies, Tsukuba, Ibaraki, 305-8506, Japan.Diesel exhaust particles (DEP) induce pulmonary diseases including asthma and chronic bronchitis. Comprehensive evaluation is required to know the mechanisms underlying the effects of air pollutants including DEP on lung diseases. Using a cDNA microarray, we examined changes in gene expression in SV40T2 cells, a rat alveolar type II epithelial cell line, following exposure to an organic extract of DEP. We identified candidate sensitive genes that were up- or down-regulated in response to DEP. The cDNA microarray analysis revealed that a 6-h exposure to the DEP extract (30 mug/ml) increased (>2-fold) the expression of 51 genes associated with drug metabolism, antioxidation, cell cycle/proliferation/apoptosis, coagulation/fibrinolysis, and expressed sequence tags (ESTs), and decreased (<0.5-fold) that of 20 genes. In the present study, heme oxygenase (HO)-1, an antioxidative enzyme, showed the maximum increase in gene expression; and type II transglutaminase (TGM-2), a regulator of coagulation, showed the most prominent decrease among the genes. We confirmed the change in the HO-1 protein level by Western blot analysis and that in the enzyme activity of TGM-2. The organic extract of DEP increased the expression of HO-1 protein and decreased the enzyme activity of TGM-2. Furthermore, these effects of DEP on either HO-1 or TGM-2 were reduced by N-acetyl-l-cysteine (NAC), thus suggesting that oxidative stress caused by this organic fraction of DEP may have induced these cellular responses. Therefore, an increase in HO-1 and a decrease in TGM-2 might be good markers of the biological response to organic compounds of airborne particulate substances.PMID: 15541757 [PubMed - in process]

  11. Life Sciences Data is Complex • e.g. Gene Ontology http://www.geneontology.org/ •     GO:0008150 : biological_process ( 109503 ) •     GO:0005575 : cellular_component ( 98453 ) •     GO:0003674 : molecular_function ( 108120 )   •     GO:0016209 : antioxidant activity ( 478 ) •     GO:0005488 : binding ( 31317 ) •     GO:0003824 : catalytic activity ( 35260 ) •     GO:0030188 : chaperone regulator activity ( 14 ) •     GO:0030234 : enzyme regulator activity ( 2087 ) •     GO:0005554 : molecular_function unknown ( 29597 ) •     GO:0003774 : motor activity ( 522 ) •     GO:0045735 : nutrient reservoir activity ( 36 ) •     GO:0004871 : signal transducer activity ( 8356 ) •     GO:0005198 : structural molecule activity ( 3428 ) •     GO:0030528 : transcription regulator activity ( 8552 )   •     GO:0017163 : negative regulator of basal transcription activity ( 15 ) •     GO:0003701 : RNA polymerase I transcription factor activity ( 31 ) •     GO:0003702 : RNA polymerase II transcription factor activity ( 982 ) •     GO:0003709 : RNA polymerase III transcription factor activity ( 41 ) •     GO:0030401 : transcription antiterminator activity ( 16 ) •     GO:0003712 : transcription cofactor activity ( 731 ) •     GO:0003700 : transcription factor activity ( 5510 ) •     GO:0016986 : transcription initiation factor activity ( 82 ) •     GO:0016988 : transcription initiation factor antagonist activity ( 9 ) •     GO:0003715 : transcription termination factor activity ( 38 ) •     GO:0016563 : transcriptional activator activity ( 499 ) •     GO:0003711 : transcriptional elongation regulator activity ( 97 ) •     GO:0016564 : transcriptional repressor activity ( 507 ) •     GO:0000156 : two-component response regulator activity ( 394 ) •     GO:0045182 : translation regulator activity ( 687 ) •     GO:0005215 : transporter activity ( 9054 ) •     GO:0030533 : triplet codon-amino acid adaptor activity ( 555 )

  12. Life Sciences Informatics in Birkbeck Comp Sci Example Research Areas: • Evolutionary analysis: reconstruction of evolutionary events from genomic and related data • Integration of life sciences data: data and knowledge management techniques to support the integration, analysis, mining and visualisation of life sciences data • Medical informatics: data integration, semantic modelling, fuzzy inferencing and data mining techniques to support virtual integration of medical records • For full details of topics, people, projects, publications… • http://www.dcs.bbk.ac.uk/research/bioinf

  13. Evolutionary Analysis • Annotating evolutionary trees • Mathematical models and algorithms addressingproblems such as: • Given an evolutionary species tree and a set of trees built on the same extant species according to similarity between individual gene families, find a mapping of the individual gene trees onto the species tree exhibiting gene duplications and losses to account for the differences • Given an evolutionary species tree and patterns of presence/absence of genes in the extant species, compute evolutionary scenarios of gene gain, horizantal transfer and loss events to account for the patterns

  14. Evolutionary Analysis • Applied to the analysis of evolutionary gains and loss of functions in herpesvirus genomes Reconstructed history of HPF161 Host–virus interaction

  15. Integration of Life Sciences Data • Integrating transcriptomics and structural data to reveal protein functions: BioMap • A data warehouse to support analysis and mining integrating data including microarray gene expression data, protein structure data, CATH structural classification data, functional data including Gene Ontology, KEGG (Gene, Orthology, Genome, Pathway…) • Creation of a pilot Grid for proteomics resources: ISpider • An integrated platform of proteomics resources supporting techniques for distributed querying, workflows and data analysis tasks in a Grid • Research approach based on semantic mapping services using the techniques developed in the AutoMed project http://www.doc.ic.ac.uk/automed/

  16. 2D Gel Visualisation Client + Phosph. Extensions + Aspergil. Extensions Proteome Request Handler Proteomic Ontologies/ Vocabularies Source Selection Services Instance Ident/Mapping Services Data Cleaning Services myGrid Ontology Services myGrid DQP myGrid Workflows AutoMed DAS WS WS WS WS WS WS WS WS PRIDE PEDRo GS PS PF TR FA PPI WS WS Phos PID Integrated Proteomics Informatics Platform - Architecture ISPIDER Proteomics Clients WP3 Vanilla Query Client PPI Validation + Analysis Client Protein ID Client WP4 WP1 WP6 WP5 WP2 Web services ISPIDER Proteomics Grid Infrastructure Existing E-Science Infrastructure Public Proteomic Resources WP6 WP3 ISPIDER Resources Existing Resources KEY: WS = Web services, GS = Genome sequence, TR = transcriptomic data, PS = protein structure, PF = protein family, FA = functional annotation, PPI = protein-protein interaction data, WP = Work Package

  17. Medical Informatics • ASsociation Studies assisted by Inference and SemanticTechnologies – ASSIST • 10 E.U. partners: U.K., Greece, Belgium, Germany, Spain • The main objectives of ASSIST are to: • • Allow researchers to combine phenotypic and genotypic data • Unify multiple patient records repositories • Automate the process of evaluatingmedical hypotheses • • Provide an inference engine capable of statisticallyevaluating medical data• Offer expressive, graphical tools for medicalresearchers to post their queries.

  18. Medical Informatics • ASSIST query processing builds on AutoMed technology with integrated ontology and inference rules capabilities

  19. Making Sense of Life Sciences Data • Some areas of on-going and future research • automated reasoning using ontologies and wider domain knowledge • evolutionary reconstruction exploiting domain knowledge • analysis and mining of heterogeneous distributed resources • metrics for data integration quality • The overarching motivation is the potential to make scientific discoveries that can improve quality of life

  20. Some Collaborators Funding Further Information • http://www.dcs.bbk.ac.uk/research/bioinf

More Related