360 likes | 498 Views
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING. BACKGROUND. Completion of sequencing projects Need for functional discovery Emerging area of study: Large scale genomic analysis Similarity of living systems. GENETIC NETWORKS. Modelling genetic networks
E N D
ANALYSIS OF GENETIC NETWORKS USING ATTRIBUTED GRAPH MATCHING
BACKGROUND • Completion of sequencing projects • Need for functional discovery • Emerging area of study: Large scale genomic analysis • Similarity of living systems
GENETIC NETWORKS • Modelling genetic networks • Interaction of genes and proteins • Relationship between topology and function
MOTIVATION • Common biological processes • Comparison of networks • Discovering missing interactions • Discovering missing genes
mge236 mge336 mge313 mge310 mge314 mge312 mge235 mge337 mpn133 mpn134 mpn145 mpn141 mpn124 mpn132 mge234 GRAPH MATCHING G1 Search-based Algorithm Pruning Techniques G2
ROADMAP • Scale-Free Networks • Modelling Genetic Networks • Graph Matching • Algorithm • Results
COMPLEX NETWORKS • Small-world model • WWW • Human acquaintances network • Citation networks • Biological networks
SMALL-WORLD • Features: • Characteristic path length • Clustering coefficient • Sparseness
SMALL-WORLD • Somewhere in between regular & random graphs
SMALL-WORLD • Highly clustered • Short diameter
SCALE-FREE NETWORKS • Complex networks: biological, social, www, power grid, citation etc. • Power low connectivity: P(k) = k -a • Hubs - authorities
SCALE-FREE NETWORKS • Application for testing scale free behavior • Yeast • Helicobacter Pylori • Mycoplasma Pnuemonia • Mycoplasma Genitelium • Linear log-log graph • Slope = a
SCALE-FREE NETWORKS • Slope is calculated by least mean square method
TOPOLOGY & FUNCTIONALITY • Small diameter • ease of dissemination of information • ease of restoring after disturbance • Cliquishness • Alternate paths are found • Heterogeneity • Random removal does not effect the network • Hubs are vulnerable to attack
BIOLOGICAL ASPECTS • Multifunctionality • Grouped into functional units • Stability • Reason: Most of the interactions are between hubs and authorities
TYPES OF GENETIC NETWORKS • Categorized by data sources • Metabolic pathways • Gene expression arrays • Protein interactions • Gene interactions
INTERACTION MAPS • High level perspective • Nodes: Genes or proteins • Edges: Presence of an interaction • Data sources • Two-hybrid analysis • Fusion analysis • Chromosomal proximity • Phylogenetic analysis
PROBLEM DEFINITION Attributed Relational Graph (ARG) G = { V, E, X}. V = {v1, v2, …, vn} Nodes E = {e1, e2, …, em} Edges X = {x1, x2,…,xn} Attributes
INEXACT SUBGRAPH MATCHING Allow for : • Mismatching attribute values • Missing nodes • Missing links Also called error-correcting subgraph isomorphism NP-Complete
SEARCH TECHNIQUES • Cost function • Pruning (Structure Constraints) • Backtracking
ATTRIBUTE MATCHING • Amino Acid Sequence Content Composition • array of 20, percentage of each aa • Amino acid grouped into classes: array of 6 • Amino acid triples grouped into classes: array of 216 MKVLNKNEL 6 x 6 x 6
ATTRIBUTE MATCHING Difference in amino acid composition values of gene pairsfor M.Genitalium and M. Pneumoniae. Score observations
STRUCTURAL CONSTRAINTS • Effect of scale-free behaviour • Connectivity information: Highly heterogeneous, thus start with most connected and work around it • Pruning strategy: comparibility is determined by power low
STRUCTURAL CONSTRAINTS • Neigborhood connectivity • Choose the neighbor at the next stage • Backtracking • Component by component • Go back to the neighbor with the most connectivity within the component
TEST CASE • Mycoplasma Genitalium: • smallest genome (470 ORFs) • Mycoplasma Pnuemoniae: • Very similar, superset (688 ORFs)
TEST CASE... • Mycoplasma Genitalium: • 232 nodes • 211 links • Mycoplasma Pnuemoniae: • 267 nodes • 257 links • Inputs: • MGE links • MPN links • MGE synonyms • MPN synonyms • MGE amino acid sequence • MPN amino acid sequence
RESULTS MGE MPN
DISCOVERY OF MISSING DATA • Missing link • Link between in MPN632 and MPN637 is missing in our data but exists in literature
DISCOVERY OF MISSING DATA • Missing node with known COG MPN236--- MPN237---MPN238---MPN678 MG098 ----MG099-----MG100----MG459 MG459 is ortholog of MPN678
DISCOVERY OF MISSING DATA • Missing node without known ortholog
CONCLUSION • Large-scale genomics • Interaction data captures system structure and dynamics • Graph matching exploits the scale-free characteristics • Novel interactions and genes can be identified
ACKNOWLEDGEMENT • YASEMİN TÜRKELİ