Finding Patterns In Three-Dimensional Graphs

Finding Patterns In Three-Dimensional Graphs Xiong Wang, Member, IEEE, Jason T.L. Wang, Member, IEEE

Outlne • Introduction & illustration • PATTERN-FINDING ALGORITHM- Phase 1 of the Algorithm- Phase 2 of the Algorithm • PERFORMANCE EVALUATION • DATA MINING APPLICATIONS

Introduction • Molecules and chemical compounds are often represented by 3D graphs. • Detecting repeatedly occurring structures in molecules can help biologists to understand functions of the molecules.

3D Graphs • Each node of the graphs we are concerned with is an undecomposable or atomic unit and has a 3D coordinate. • Each node has a label, which is not necessarily unique in a graph. • A node can be identified by a unique, user-assigned number in the graph. • Edges in the graph are links between the atomic units. In the paper, we will consider 3D graphs that are connected.

rigid substructures in a graph (1) • The rigid substructure as a whole can be rotate (we refer to this as a whole-structure rotation or simply a rotation when the context is clear). • For example, in chemical compounds, a ring is often a rigid substructure.

To illustrate rigid substructures in a graph (2)

patterns in 3D Graphs • We consider a pattern to be a rigid substructure that may occur in a graph after allowing for an arbitrary number of rotations and translations as well as a small number (specified by the user) of edit operations in the pattern or in the graph. • We say graph G matches graph G’with n mutations if by applying an arbitrary number of rotations and translations as well as n node insert, delete or relabeling operations, G and G’ have the same size, the nodes in G geometrically match those in G’ , i.e., they have coinciding 3D coordinates, and for each pair of geometrically matching nodes, they have the same label.

To illustrate patterns in graphs

PATTERN-FINDING ALGORITHM • Our algorithm proceeds in two phases to search for the patterns: 1) find candidate patterns from the graphs in S; and 2) calculate the occurrence numbers of the candidate patterns to determine which of them satisfy the user-specified requirements. We describe each phase in turn below.

Phase 1 of the Algorithm(1) • First, we decompose the graph into substructures having no rotatable components and consider the substructures separately.

Phase 1 of the Algorithm(2) • Illustration Stack Step.

Phase 1 of the Algorithm(3) • Second, our algorithm hashes node-triplets within rigid substructures into a 3D table. When a graph as a whole is too large, as in the case of proteins, considering all combinations of three nodes in the graph may become prohibitive. • For example, consider a graph of 20 nodes. There are . On the other hand, if we decompose the graph into five substructures, each having four nodes, then there are only .

Phase 2 of the Algorithm(1) • In subphase A of phase 2, we hash and store the candidate patterns generated from the graphs in phase 1 in a 3D table. • In subphase B, we rehash each candidate pattern into 3D table and calculate its occurrence number.

Phase 2 of the Algorithm(2) • In processing a rigid substructure (pattern) of a 3D graph, we choose all three-node combinations, referred to as node-triplets, in the substructure and hash the node-triplets. • The first node chosen always opposes the longest edge of the triangle and the third node chosen opposes the shortest edge.

Phase 2 of the Algorithm(3) • if the triangle is not isosceles or equilateral then we choose (i, j, k) in that order. • when the triangle is isosceles, we hash and store the two node-triplets (i, j, k)and (i, k, j), assuming that node i opposes the longest edge and edges {i; j}, {i; k} have the same length. • When the triangle is equilateral, we hash and store the six node-triplets (i, j, k), (i, k, j), (j, i, k), (j, k, i), (k, i, j), and (k, j, i).

Phase 2 of the Algorithm(4) • The labels of the nodes in a triplet form a label-triplet, which is encoded as follows: • Suppose the three nodes chosen are v1, v2, v3, in that order. The code for the labels is an unsigned long integer, defined as : where Prime > |Σ| is a prime number, L1, L2, and L3 are the indices for the node labels of v1, v2, and v3, respectively, in the array A. • Example :Suppose Prime is 1009 , if the three nodes label are c, a, and b , then for the three nodes label are 2, 0, and 1.the code for the corresponding label-triplet is ..2 1; 009 . 0. 1; 009. . 1 . 2; 036; 163:

Subphase A of Phase 2 (1) • Suppose the chosen nodes are numbered(i, j, k), and have global coordinates Pi.xi; yi; zi., Pj.xj; yj; zj., and Pk.xk; yk; zk., respectively. Let l1, l2, l3 be three integers where Here, Scale . 10p is a multiplier.

Subphase A of Phase 2 (2) • The node-triplet .i; j; k. is hashed to the 3D bin with the address h[d1,d2,d3].Prime1, Prime2, and Prime3 are three prime numbers and Nrow is the cardinality of the hash table in each dimension. • The hash bin entry for the three chosen nodes i, j, k is

Subphase B of Phase 2 (1) • In subphase B, we calculate the occurrence number of each candidate pattern P by rehashing the node-triplets of P into H. • We are able to match a node-triplet tri of P with a node-triplet tri0 of another substructure (candidate pattern) P0 stored in subphase A where tri and tri0 have the same hash bin address.

Subphase B of Phase 2 (2) • they have coinciding 3D coordinates after rotations and translations, we call the node-triplet match a true match; otherwise it is a false match. • If the SFPis the same as an existing one with counter value Cnt, and the code of the label-triplet of nodes i, j, k; equals the code of the labeltriplet of nodes u, v, w, then Cnt is incremented by one.

Subphase B of Phase 2 (3) • Let P be a pattern where jPj Mut . 3. After rehashing the node-triplets of P, suppose there is an SFP associated with Str whose counter value Cnt > θP , whereThen, P matches Str within Mut mutations

PERFORMANCE EVALUATION • To evaluate the performance of the patternfinding algorithm, we applied the algorithm to two sets of data: 1,000 synthetic graphs and 226 chemical compounds obtained from a drug database maintained in the National Cancer Institute. • When generating the artificial graphs, we randomly generated the 3D coordinates for each node. The node labels were drawn randomly from the range A to E. The size of the rigid substructures in an artificial graph ranged from 4 to 10 and the size of the graphs ranged from 10 to 50. The size of the compounds ranged from 5 to 51.

Effect of Data-Related Parameters (1) where PFound is the number of patterns found by the proposed algorithm, UPFound is the number of patterns found that satisfy the user-specified parameter values, and TotalP is the number of qualified patterns found by exhaustive search.

Effect of Data-Related Parameters (2)

Effect of Algorithm-Related Parameters (1)

DATA MINING APPLICATIONS Classifying Proteins • Two families of proteins chosen from PDB pertaining to RNA-directed DNA Polymerase and Thymidylate Synthase. Each family contains proteins having the same functionality in various organisms. • The binary classification problem studied here is concerned with assigning a protein to one of two families. This problem arises frequently in protein homology detection [31]. • The performance of our technique degrades when applied to the n-ary classification problem, which is concerned with assigning a protein to one of many families

DATA MINING APPLICATIONS Clustering Compounds • Using the proposed pattern-finding algorithm, we examine each graph Gqin S and determine whether each substructure Strpapproximately occurs in Gqwithin Mut mutations. • Each graph Gqis represented as a bit string of length N, i.e., • The distance between two graphs Gxand Gy, denoted d(Gx,Gy), is defined as the Hamming distance [12] between their bit strings.

Conclusion • we have presented an algorithm for finding patterns in 3D graphs • A pattern here is a rigid substructure that may occur in a graph after allowing for an arbitrary number of rotations and translations as well as a small number of edit operations in the pattern or in the graph. • We used the algorithm to find approximately common patterns in a set of synthetic graphs, chemical compounds and proteins.

Finding Patterns In Three-Dimensional Graphs

Finding Patterns In Three-Dimensional Graphs

Presentation Transcript

Three-Dimensional Graphics

Three-Dimensional Geometry

Three Dimensional Plotting

Three-Dimensional Figures

Three-Dimensional Figures

Three-Dimensional Symmetry

Three-Dimensional Figures

Three-Dimensional Activities

THREE DIMENSIONAL MEDIA

PAnG – Finding Patterns in Annotation Graphs

Three-Dimensional Figures

Three-Dimensional Art

Three Dimensional Graphing

Three-Dimensional Figures

Three Dimensional Geometry

Three Dimensional (3D)

Three Dimensional SBM in MM5

Finding Cross Genome Patterns in Annotation Graphs

Three Dimensional Shapes

Three-dimensional Matrices

three dimensional quality

Three-Dimensional Modeling