1 / 38

A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables. Kazuhide Aikou 1 , Yusuke Suzuki 1,2 , Takayoshi Shoudai 1 , Tomoyuki Uchida 2 , Tetsuhiro Miyahara 2. Department of Informatics, Kyushu University, Japan

Download Presentation

A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Polynomial Time Matching Algorithm of Ordered Tree Patterns having Height-Constrained Variables Kazuhide Aikou1, Yusuke Suzuki1,2, Takayoshi Shoudai1, Tomoyuki Uchida2, Tetsuhiro Miyahara2 • Department of Informatics, Kyushu University, Japan • Faculty of Information Sciences, Hiroshima City University, Japan

  2. Contents • Backgrounds and Motivations • Preliminaries • Ordered Term Trees • Height-Constrained Variables • A Matching Algorithm of Ordered Term Trees having Height-Constrained Variables • Conclusions and Future Works

  3. Ordered Term Trees Backgrounds Increase of Tree-structured Data (Web Documents, HTML/XML, etc.) • Our Works: • COLT for Term Trees • Web Mining Systems Using Learning Algorithms for Term Trees <Salesperiod> <Salesperiod> <Quarter>Winter1998</Quarter> <Design> <Designnumber>C365</Designnumber> <Description>North Star Polo</Description> <Unitssold>35500</Unitssold> </Design> </Salesperiod> <Quarter> <Design> <Designnumber> <Unitssold> Winter1998 <Description> C365 35500 North Star Polo App.: Knowledge Discovery from Web Documents Discovery of Tree-structured Patterns Common to Tree-structured Data <HTML> <Head> <Body> <Table> <Table> <Title> <Table> Text_university

  4. <HTML> <HTML> <HEAD> <BODY> <HEAD> <BODY> 1 2 <DIV> <DIV> <FONT> <FONT> text1 text1 <FONT> <FONT> 1 1 2 3 text3 text4 text3 text4 text2 text2 1 1 1 TAG TEXT Preliminaries Ordered trees express semi-structured data (HTML, XML, etc). HTMLData <HTML> <HEAD>text1</HEAD> <BODY> <DIV>text2</DIV> <FONT>text3</FONT> <FONT>text4</FONT> </BODY> </HTML> Object Exchange Model Ordered Trees with Edge Labels

  5. Variables with exactly one child port The parent port of h1 Single-child port variables The parent port of h2 Multi-child port variables The child port of h1 Variables with at least one child port The child ports of h2 Ordered Term Trees with Multi-Child Port Variables Ordered Tree Patterns with Internal Structured Variables An ordered term tree t=(V,E,H) x,y,...: variable labels u1 V: A vertex set E: An edge set H: A variable set Variable h1 x u2 u3 u4 y Variable h2 u5 u7 u8 u6 A variable can be substituted with an arbitrary ordered tree.

  6. Identify the root of T1 with the parent port. Identify the root of T2 with the parent port. w1 w1 w1 u1 u1 u1 u1 u1 u1 v1 v1 v1 v1 x x w2 vi vi w2 w2 vi u2 v2 u2 v2 v2 v2 u3 u3 u3 u3 u3 u3 u4 u4 u4 u4 u4 u4 v2 v2 v2 v2 v3 v3 v3 v3 y y y y Identify the two leaves with the two child ports. w3 w3 w3 w4 w4 w4 u5 u5 u2 u2 u2 u2 u6 u6 vi vi u7 u7 u7 w2 u7 w2 v4 v4 v4 v4 Chose one of the leaves of T2 and Identify it with the child port. u5 u5 u5 u5 u6 u6 u6 u6 u7 u7 w4 w4 Substitutions An ordered tree T1 An ordered tree T2 A new ordered treeT An ordered term treet Replacements of the variables with T1 and T2

  7. Linear Ordered Term Trees:All variables have mutually distinct variable labels. All variable replacements are decided independently. An ordered tree A linear ordered term tree A substitution x match y

  8. Matching Problem for Linear Ordered Term Trees with Multi-Child Port Variables INPUT T: an ordered tree; t: a linear ordered term tree with multi-child port variables. PROBLEM Does t match T? This matching problem is computed in O(nN) time, where n is the number of vertices in t and N is the number of vertices in T[Suzuki et al., ILP 02].

  9. <HTML> <HEAD> <BODY> 1 2 <DIV> <FONT> text1 <FONT> 1 1 2 3 text3 text4 text2 1 1 1 Observation:Most of ordered trees obtained from HTML files have low height. An HTML file <HTML> <HEAD>text1</HEAD> <BODY> <DIV>text2</DIV> <FONT>text3</FONT> <FONT>text4</FONT> </BODY> </HTML> height

  10. Relationships between the size of the tree representing an HTML file and the height of it. A tree of a big height is rare. Then, it becomes a feature if there is a long branch. 40 30 Height 20 10 0 0 500 1000 1500 2000 Size = The number of vertices in a tree

  11. i j The trunk length  i The height  j Height-constrained single-child port variables 0 < i ≦ j ( i , j ) ( i’, j’) Trunk Length: The path length between the root and the leaf which are identified with the ports.

  12. 1 2 3 (2,4) (2,2) Example. N.G. O.K An ordered term tree t An ordered tree T

  13. (4,6) (1,2) MATCHING PROBLEMfor Linear Ordered Term Trees with Height-Constrained Single-Child Port Variables A linear ordered term tree t An ordered tree T INPUT: PROBLEM:Does t match T?

  14. Main Theorem • MATCHING PROBLEMfor Linear Ordered Term Trees with Height-Constrained Single-Child Port Variablesis computed in O(N max{nDmax, S}) time, where n: the number of vertices of t, N: the number of vertices of T, S: the total amount of the lowest trunk lengths of all variables of t, Dmax: the maximum number of children of a vertex of T.

  15. (1,1) (1,1) u’ (4,6) (4,6) (1,2) Sub Term Tree and Subtree A linear ordered term tree t An ordered tree T t[u’] u T[u] -T[v] v u and all descendants of u which are not proper descendants of v

  16. u v v’ u i j v’ v v T[v] t[v’] Idea:Corresponding Sets CS(u) • (v’,i,j)∈CS(u)shows that there is a descendant v of u such that • t[v’] matches T[v], • the length between u and v is i (if i < i’-1), and • (3) the height of T[u]-T[v] is j. t=(Vt,Et,Ht): a term tree, T=(VT,ET): a tree. CS(u)Vt×N×N: a corresponding set of a vertex uVT. (v’,i,j)∈CS(u) T t (i’,j’) v match

  17. u v’ Therefore,(v’,0,0)CS(u) if and only if t[v’] matches T[u]. (v’,0,0)∈CS(u) T t (i’,j’) match (the root of t,0,0)CS(the root of T) if and only if t matches T.

  18. Algorithm Matching(t,T) 1 Initialization; while there is an unmarked vertex u of Tdo begin Mark u; VID-Inheriting(u); C-Set-Attaching(u) end 2 3

  19. Algorithm Matching(t,T) Initialization; while there is an unmarked vertex u of Tdo begin Mark u; VID-Inheriting(u); C-Set-Attaching(u) end

  20. 1 Vertex identifiers 2 3 4 5 6 7 8 9 (1,2) (1,2) (2,2) (2,2) The children of an internal vertex have consecutive vertex identifiers. This saves computation time of main processes. Initialization:Vertex Identifiers A linear ordered term tree t Breadth-first search order

  21. CS(Q) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(Q)=0 CS(K) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(K)=0 CS(F) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(F)=0 CS(L) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(L)=0 CS(J) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(J)=0 CS(M) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(M)=0 CS(H) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(H)=0 D F H J CS(D) (4,0,0),(6,0,0), = (7,0,0),(8,0,0), (9,0,0) height(D)=0 7 4 6 K L M 8 9 Q Compute the corresponding set of each vertex from leaves to the root. from leaves to the root A T 1 t B C 2 3 (3,6) D E I F G H J (1,2) 7 4 5 6 N K L M 8 9 O Initialization: For all leaves u of T, Mark u; CS(u):={(u’,0,0) | u’ is a leaf of t.}; height(u):=0; P Q

  22. Algorithm Matching(t,T) Initialization; while there is an unmarked vertex u of Tdo begin Mark u; VID-Inheriting(u); C-Set-Attaching(u) end

  23. If i’=i-1 then the parent of u can match the parent port u’. u’ (i,j) v’ N can become a vertex 3. 3 (3,6) 7 VID-Inheriting (1/3): Let v’ be the child port of an (i,j)-height constrained variable. For an internal vertex u of a tree, if there is an element (v’,i’,j’) in the CS of a child of u, add (v’, min{i’+1,i-1}, *) to CS(u). Next slide C Example Add (7,2,4) to CS(I) (7,0,0)∈CS(J) I J Add (7,2,3) to CS(N) N Add (7,2,2) to CS(O) O Add (7,1,1) to CS(P) P (7,0,0)∈CS(Q) Q

  24. a c b (7,1,3)∈CS(c) (7,1,1)∈CS(b) height(c)=3 height(b)=4 3 4 VID-Inheriting (2/3):Case: At least two children have (v’,i’,*) for a vertex v’ and an integer i’. (7,2,4) (7,2,4)∈CS(a) ∈CS(a) , (7,2,5) T 3 (4,6) c b 7 Choose the smallest height

  25. a c b (7,2,2)∈CS(c) 3 (7,1,3)∈CS(b) 4 height(c)=3 height(b)=4 c b VID-Inheriting (3/3):Case: A child has (v’,i’,*) and another child has (v’,i’’,*) for distinct integers i’ and i’’. Add all triplets to CS(u) (at most i triplets) (7,2,4) (7,3,5) ,∈CS(a) 3 T (4,6) 7 • CS(a) contains at most S triplets. • Then the total time complexity of Inheriting of a vertex a is O(Sma), where ma is the number of the children of a.

  26. Algorithm Matching(t,T) Initialization; while there is an unmarked vertex u of Tdo begin Mark u; VID-Inheriting(u); C-Set-Attaching(u) end

  27. C-Set-Attaching (Small Examples) t (2,0,0) should be added to CS(B). 2 B (4,0,0)CS(D) (6,0,0)CS(H) 4 5 6 D F H (5,0,0)CS(F) (2,0,0) is added to CS(B). t B 2 (4,0,0)CS(D) (6,0,0)CS(H) (1,2) D E F G H 4 5 6 (5,0,0)CS(G) height(E)=1 height(F)=2 (5,0,0)CS(G) covers [E,G].

  28. (2,0,0) is added to CS(B). t B 2 (4,0,0)CS(D) (6,0,0)CS(H) (1,2) D E F G H 4 5 6 (5,1,1)CS(F) height(E)=1 height(G)=2 (5,1,1)CS(F) covers [E,G]. (2,0,0) may not be added to CS(B). t B 2 (4,0,0)CS(D) (6,0,0)CS(H) (1,2) D E F G H 4 5 6 (5,1,1)CS(F) height(E)=3 height(G)=2 (5,1,1)CS(F) covers [F,G] but cannot cover E.

  29. C-Set-Attaching (A Big Example) An ordered term tree 11 t (4,8) (3,4) (4,7) (5,5) 2 1 3 4 5 6 7 8 9 10

  30. CS(F) = CS(B) = (1,0,0), (4,0,0) (7,2,3) (2,0,0), (4,0,0) height(F)=2 height(B)=5 CS(A) = CS(N) = (1,0,0), CS(J) = CS(H) = (6,0,0), (10,3,4) height(A)=9 CS(G) = (7,2,3), (10,3,3) (5,0,0), (6,0,0), (8,4,4), (9,0,0) CS(D) = (2,0,0), (4,0,0), (5,0,0), (8,4,4) CS(C) = CS(I) = height(N)=4 CS(E) = (5,0,0) (3,3,4),(6,0,0) height(J)=7 (3,3,5), (6,0,0) (3,3,3) height(C)=4 CS(M) = height(E)=3 height(D)=5 height(H)=6 height(I)=5 (5,0,0), (9,0,0) CS(L) = height(G)=5 (4,0,0), (8,4,4) height(M)=4 height(L)=9 An ordered tree O A B C D E F G H J N I K L M CS(K) = φ height(K)=1

  31. First, we prepare a virtual table for a new graph. Rows and columns represent vertices of T and t, respectively.

  32. CS(F) = (1,0,0), (4,0,0) (7,2,3) height(F)=2 CS(H) = CS(G) = (5,0,0), (6,0,0), (8,4,4), (9,0,0) CS(D) = (2,0,0), (4,0,0), (5,0,0), (8,4,4) CS(I) = CS(E) = (3,3,4),(6,0,0) (3,3,5), (6,0,0) (3,3,3) height(R)=3 height(F)=5 height(H)=6 height(I)=5 height(G)=5 An ordered term tree An ordered tree 11 O (3,4) D E F G H I 7 (7,2,3)CS(F) covers [E,F]. [E,F] Add a vertex labeled with [E,F] to F7 in the table.

  33. CS(F) = (1,0,0), (4,0,0) (7,2,3) height(F)=2 CS(H) = CS(G) = (5,0,0), (6,0,0), (8,4,4), (9,0,0) CS(D) = (2,0,0), (4,0,0), (5,0,0), (8,4,4) CS(I) = CS(E) = (3,3,4),(6,0,0) (3,3,5), (6,0,0) (3,3,3) height(E)=3 height(D)=5 height(H)=6 height(I)=5 height(G)=5 An ordered term tree An ordered tree 11 O (3,4) (5,5) D E F G H I 7 8 (8,4,4)CS(G) covers [E,G]. [E,F] [E,G] Add a vertex labeled with [E,G] to G8 in the table.

  34. CS(F) = (1,0,0), (4,0,0) (7,2,3) height(F)=2 CS(H) = CS(G) = (5,0,0), (6,0,0), (8,4,4), (9,0,0) CS(D) = (2,0,0), (4,0,0), (5,0,0), (8,4,4) CS(I) = CS(E) = (3,3,4),(6,0,0) (3,3,5), (6,0,0) (3,3,3) height(E)=3 height(D)=5 height(H)=6 height(I)=5 height(G)=5 An ordered term tree An ordered tree 11 O (3,4) (5,5) D E F G H I 7 8 (8,4,4)CS(H) covers [H,H]. Add a directed edge from [E,F] at F7 to [E,G] at G8, because two consecutive variables cover all vertices from E to G. [E,F] [E,G] Add a vertex labeled with [H,H] to H8 in the table. [H,H]

  35. [B,K] [J,K] [K,N] [M,N] vstart • If there is a directed path from vstart to vgoal, (11,0,0) is added to CS(O). • The total time complexity of C-Set-Attaching of a vertex u of T and a vertex u’ of t is O(mu2m’u’), where mu and m’u’ are the numbers of the children of u and u’, respectively. [B,K] [B,K] [E,F] [E,G] [H,H] [B,K] [J,K] [B,K] [K,N] vgoal [M,N]

  36. Total Time Complexity • VID-Inheriting(u): O(Smu) • C-Set-Attaching(u): O(mu2m’u’) mu: the number of children of a vertex u of T, m’u’: the number of children of a vertex u’ of t. • Total: O(N max{nDmax,S}) n: the number of vertices of t, N: the number of vertices of T, S: the total amount of the lowest trunk lengths of all variables of t, Dmax: the maximum number of children of a vertex of T.

  37. Conclusions • An O(N max{nDmax,S}) Time Matching Algorithm for Ordered Term Trees with Height-Constrained Variables. • [Our Related Works] Polynomial-Time Learning Algorithms for Ordered Term Trees with Height-Constrained Variables [Suzuki et al., PRICAI'04], [Matsumoto and Shoudai, ALT'04]. Future Works: • An Efficient Matching Algorithm for Ordered Term Trees with Height-Constrained Multi-Child Port Variables. • Polynomial-Time Learning Algorithms for Ordered Term Trees with Height-Constrained Multi-Child PortVariables.

  38. Thank you for your attention.

More Related