1 / 15

Graph mining in bioinformatics

Laur Tooming. Graph mining in bioinformatics. Graphs in biology. Graphs are often used in bioinformatics for describing processes in the cell Vertices are genes or proteins The meaning of an edge depends on the type of the graph Protein-protein interaction Gene regulation.

aelwen
Download Presentation

Graph mining in bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Laur Tooming Graph mining in bioinformatics

  2. Graphs in biology • Graphs are often used in bioinformatics for describing processes in the cell • Vertices are genes or proteins • The meaning of an edge depends on the type of the graph • Protein-protein interaction • Gene regulation

  3. What we’re looking for • We want to find sets of genes that have a biological meaning. • Idea: find graph-theoretically relevant sets of vertices and find out if they are also biologically meaningful. • Simple example: connected components • A more advanced idea: graph clustering. Find subgraphs that have a high edge density.

  4. Stijn van Dongen, Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, May 2000. http://micans.org/mcl/ Markov Cluster Algorithm (MCL) • If there is cluster structure in a graph, random walks tend to remain in a cluster for a long time • Graph modelled as a stochastic matrix: sum of entries in a column is 1 • aij- probability that randomly walking out of j will go to i on the next step • Bigger edge weight means greater probability of choosing that edge

  5. Markov Cluster Algorithm (MCL) • Two procedures, inflation and expansion, are applied alternatively • Expansion: matrix squaring • considers longer random walks • Inflation: raising entries to some power, rescaling to remain stochastic • Weakens weak edges and strengthens strong ones • Converges to a steady state

  6. Markov Cluster Algorithm (MCL) Images from http://micans.org/mcl/ani/mcl-animation.html

  7. Betweenness centrality clustering • An edge between different clusters is on many shortest paths from one cluster to another. • An edge inside a cluster is on less shortest paths, because there are more alternative paths inside a cluster. • Betweenness centrality of an edge - the number of shortest paths in the graph containing that edge. • Remove edges with the highest centrality from the graph to obtain clustering. • Optimisations: • instead of all shortest paths, pick a sample of vertices and calculate shortest paths from them • remove several edges at once

  8. GraphWeb • Web interface for analysing biological graphs • Simple syntax for entering graphs • multiple datasets • directed edges • edge weights • Visualising graphs with GraphViz • Finding biological meaning with g:Profiler ds1: A > B 10 ds2: A > B 4 ds1: B C 5 ds2: C > D 12

  9. Combining several datasets • Whether or not there is an edge between two vertices is determined in biological experiments, which may sometimes give false results. • For a given graph different sources may give different information. Some sources may be more trustworthy than others. • We would like to combine different sources and assess the trustworthyness of each edge in the resulting graph. • Edge weight in summary graph: sum over datasets • w(e,G) = Σw(e,Gi)*w(Gi)

  10. Combining several datasets

  11. The end

More Related