440 likes | 599 Views
Evidence networks for the analysis of biological systems. Rainer Breitling IBLS – Molecular Plant Science group Bioinformatics Research Centre University of Glasgow, Scotland, UK. Background. Datasets and evidence networks in post-genomic biology. Genomics.
E N D
Evidence networks for the analysis of biological systems Rainer Breitling IBLS – Molecular Plant Science group Bioinformatics Research Centre University of Glasgow, Scotland, UK
Background Datasets and evidence networks in post-genomic biology
Genomics Fully sequenced genomes (1995-2004): 18 archaea 163 bacteria 3 protozoa 24 yeast species and fungi 2 plants (Arabidopsis, rice) 2 insects (flies, honey bee) 2 worms (C.elegans, C. briggsae) 3 fish (fugu, puffer, zebrafish) chicken, cow, dog, mouse, rat, chimp human lots of “lists” of genes
Transcriptomics • microarrays measure gene expression levels (mRNA concentrations) • relative or absolute values • in organisms, tissues, cells • produce gene lists (e.g., which genes are up-regulated by a disease, by drug treatment, in a certain tissue)
Proteomics • 2D gels, liquid chromatography, and mass spectrometry measure protein concentrations • in tissues, cells, organelles • detect chemical modifications and processing of proteins • produces lists of protein variants that are different among conditions
Metabolomics • chromatography and mass spectrometry measure metabolite concentrations • in tissues, cells, body fluids, cell culture medium • produces lists of affected metabolites
Evidence networks • relate items (genes, proteins, metabolites) that “have something to do with each other” • relationship is based on objective evidence • represented as bipartite graphs • two classes of nodes: items and evidence • automated analysis of results possible • intuitive visualization and links to literature
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations A O M P K Z Y Q V D R L B C E F G H S N U J X I T W phy: aompkzy--d-l-----------it – 22aompkzy--d-l-----------it- NtpA [C] H+-ATPase subunit A 17aompkzy--d-l-----------it- NtpB [C] H+-ATPase subunit B 17aompkzy--d-l-----------it- NtpD [C] H+-ATPase subunit D 18aompkzy--d-l-----------it- NtpI [C] H+-ATPase subunit I
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations
Types of evidence networks • Relationship can be based on • physical neighborhood • phyletic pattern similarity • expressional correlation • biophysical similarity • chemical transformation • functional co-operation • literature co-citations
What is the big picture? Graph-based iterative Group Analysis for the automated interpretation of biological datasets lists + graphs = understanding
iterative Group Analysis (iGA) iGA uses simple hypergeometric distribution to obtain p-values Breitling et al., BMC Bioinformatics, 2004, 5:34
Graph-based iGA Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA 1. step: build the network Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA 2. step: assign ranks to genes Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA 3. step: find local minima p = 1/8 = 0.125 p = 6/8 = 0.75 p = 2/8 = 0.25 Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA 4. step: extend subgraph from minima p=0.014 p=0.018 p=0.125 p=1 Breitling et al., BMC Bioinformatics, 2004, 5:100
Graph-based iGA 5. step: select p-value minimum p=0.014 p=0.018 p=0.125 p=1 Breitling et al., BMC Bioinformatics, 2004, 5:100
Advantages of GiGA • fast, unbiased and comprehensive analysis • assignment of statistical significance values to interpretation • detection of significant changes even if data are too noisy to reliably detect changed genes • statistically meaningful interpretation already without replication experiments • detection of patterns even for small absolute changes • flexible use of annotations + intuitive visualization
Example 1 Microarrays Gene expression changes during the yeast diauxic shift
Yeast diauxic shift studyDeRisi et al. (1997)Science 278: 680-6
small ribosomal subunit large ribosomal subunit nucleolar rRNA processing translational elongation
respiratory chain complex II glyoxylate cycle citrate (TCA) cycle oxidative phosphorylation (complex V) respiratory chain complex III
respiratory chain complex IV
Example 2 Metabolomics Changes in metabolic profiles in drug-treated trypanosomes
GiGA applied to metabolomics data • Challenge: No annotation available • Solution: Build evidence network based on hypothetical reactions between observed masses (=mass differences)
Metabolite tree of mass 257.1028 (glycerylphosphorylcholine) 6 generations
Metabolite tree of mass 257.1028 4 generations
Metabolite tree of mass 257.1028 2 generations
Metabolite tree of mass 257.1028 colors indicate changes of metabolite signals compared to untreated samples after 60 min pentamidine (red = down, green = up)
Choline tree found by GiGA(most significant subgraph, p<10-13) extracted from
Summary • post-genomic technologies produces “lists” • neighborhood relationships yield “evidence networks (graphs) • lists + graphs = biological insights • GiGA graph analysis highlights and connects relevant areas in the “evidence network”
Acknowledgements • Pawel Herzyk – Sir Henry Wellcome Functional Genomics Facility • Anna Amtmann & Patrick Armengaud – IBLS Molecular Plant Science group • Mike Barrett – IBLS Parasitology Research group • FGF academic users: Wilhelmina Behan, Simone Boldt, Anna Casburn-Jones, Gillian Douce, Paul Everest, Michael Farthing, Heather Johnston, Walter Kolch, Peter O'Shaughnessy, Susan Pyne, Rosemary Smith, Hawys Williams
Contact Rainer Breitling Bioinformatics Research Centre Davidson Building A416 University of Glasgow, Scotland, UK R.Breitling@bio.gla.ac.uk http://www.brc.dcs.gla.ac.uk/~rb106x