370 likes | 477 Views
E123 Presentation Title. Joe Celko Data Junction PDS 71062.1056@compuserve.com. Joe Celko - Articles. Member of ANSI X3H2 since 1987 SQL for Smarties - DBMS Magazine Celko on SQL - DBP&D SQL Puzzle - Boxes & Arrows DBMS/Report - Systems Integration WATCOM SQL Column - PBDJ
E N D
E123Presentation Title • Joe Celko • Data Junction • PDS • 71062.1056@compuserve.com
Joe Celko - Articles • Member of ANSI X3H2 since 1987 • SQL for Smarties - DBMS Magazine • Celko on SQL - DBP&D • SQL Puzzle - Boxes & Arrows • DBMS/Report - Systems Integration • WATCOM SQL Column - PBDJ • Celko on Software - COMPUTING(UK) • Celko - Intelligent Enterprise Magazine • SELECT FROM Austin - DB/M (Netherlands)
Joe Celko - Books • JOE CELKO’S SQL FOR SMARTIES - 1995, 1999 Morgan-Kaufmann • INSTANT SQL - 1995, Wrox Press • JOE CELKO’S SQL PUZZLES & ANSWERS - 1997, Morgan-Kaufmann • DATA & DATABASES - 2000, Morgan-Kaufmann
Trees in SQL • Trees are graph structures used to represent • Hierarchies • Parts explosions • Organizational charts • Three methods in SQL • Adjacency list model • Nested set model • Path enumeration
Trees in SQL -2 • Trees are not hierarchies • Hierarchies have subordination • Kill your captain, you still have to take orders from your general • Break an edge in a tree, and you have two or more disjoint trees. • This means an adjacency list model is a tree, but not a hierarchy
Trees as Nested Sets root A0 B0 A1 A2
Graphs as Tables • Nodes and edges are not the same kind of things • Organizational chart & Personnel file • You should use separate tables for the structure and the elements • The structure table will be small (two integers and a key) • You can put more than one structure table on the same elements
Adjacency List Model node parent cost =============== Root NULL 2.50 A0 Root 1.75 A1 A0 2.00 A2 A0 3.50 B0 Root 4.00 • Cost really should not be in the table, but most adjacency list tables mix nodes and edges (see Oracle’s sample database) • Most common method in use.
Path Enumeration Model Tree node cost path ============== Root 2.50 ‘Root’ A0 1.75 ‘Root,A0’ A1 2.00 ‘Root,A0,A1’ A2 3.50 ‘Root,A0,A2’ B0 4.00 ‘Root,B0’ • Cost really should not be in the table, but most path enumeration tables mix nodes and edges. • Paths are search with path LIKE ‘Root,%’predicates
Nested Sets with Numbers 1 2 3 4 5 6 7 8 9 10 A0 B0 Root A1 A2
Nested Sets as Numbers • Split nodes and edges into two tables. • You can join them back together later • This could be personnel and Org chart • Tree.node would be job titles • Nodes would need job titles and the person holding it Tree Nodes node lft rgt node cost ============ ======= Root 1 10 Root 2.50 A0 2 7 A0 1.75 A1 3 4 A1 2.00 A2 5 6 A2 3.50 B0 8 9 B0 4.00
Problems with Adjacency list • You have to use cursors or self-joins to traverse the tree • Cursors are not a table -- their order has meaning -- Closure violation! • Cursors take MUCH longer than queries • Ten level self-joins are worse than cursors
Problems with Path Enumeration • Path can get long in a deep tree • Great for searching down the tree, but not up the tree • SELECT * FROM Tree WHERE path LIKE ‘Root,%’; • SELECT * FROM Tree WHERE path LIKE ‘%,B0’; • Inserting and deleting nodes is complicated • Requires string manipulation to change all the paths beneath the insertion or deletion point
Tree Aggregates • Give me the total cost for all subtrees • (root, 13.75) -- sum of every node in tree • (A0, 7.25) -- sum of “A0” subtree • (A1, 2.00) • (A2, 3.50) • Dropping A2 would reduce all superior rows by 3.50,but would not change A1
Find Root of Tree • SELECT * FROM Tree WHERE lft = 1; • It helps to have an index the lft column • The rgt value will be twice the number of nodes in the tree. • General rule: The number of nodes in any subtree ((rgt -lft) + 1 )/ 2
Find All Leaf Nodes • SELECT * FROM Tree WHERE lft = rgt -1; • An index on lft will help • A covering index on (lft, rgt) is even better
Find Superiors of X • SELECT Super.* FROM Tree AS T1, Tree AS Sup WHERE T1.node = ‘X’ AND T1.lft BETWEEN Sup.lft AND Sup.rgt; • This is the most important trick in this method • The BETWEEN predicates preserve subordination in the hierarchy • One query for any depth tree
Find Subordinates of X • SELECT Sub.* FROM Tree AS T1, Tree AS Sub WHERE T1.node = ‘X’ AND Sub.lft BETWEEN T1.lft ANDT1.rgt; • This is the same pattern as finding superiors
Find Depth of Tree • SELECT T1.node, COUNT(T2.node) AS level FROM Tree AS T1, Tree AS T2 WHERE T1.lft BETWEEN T2.lft AND T2.rgt GROUP BY T1.node; • Count the containing nested sets for levels • The closer to the root a node is, the greater the value of (rgt - lft)
Totals by Level in Tree • SELECT T1.node, SUM(T2.cost) AS total_level_cost FROM Tree AS T1, Tree AS T2 WHERE T2.lft BETWEEN T1.lft AND T1.rgt GROUP BY T1.node; • Uses any aggregate function the same way
Delete a Subtree • Remove subtree rooted at :my_node • DELETE FROM Tree WHERE lft BETWEEN (SELECT T1.lft FROM Tree AS T1 WHERE T1.node = :my_node) AND (SELECT T2.rgt FROM Tree AS T2 WHERE T2.node = :my_node)
Delete a Single Node • Method one - promote a child to the parent’s prior position in the tree. Oldest son inherits family business • Method two- subordinate the entire subtree to the grandparent. Orphans go live with grandmother.
Delete & Promote Oldest - 1 • Delete A0 node
Delete & Promote Subtree - 1 • Delete A0 node
Closing gaps in nested set model • Deleted nodes leave gaps in numbering of lft and rgt nodes. • Fill in gaps by sliding everyone over to the lft until there are no gaps. • First build a table of all lft and rgt values. LftRgt( seq) • UPDATE Tree SET lft = (SELECT COUNT(*) FROM LftRgt WHERE LftRgt.seq < =Tree.lft), rgt = (SELECT COUNT(*) FROM LftRgt WHERE LftRgt.seq < =Tree.rgt);
Converting Nested Sets to Adjacency • SELECT B.emp AS boss, P.emp • FROM OrgChart AS P • LEFT OUTER JOIN • OrgChart AS B • ON B.lft • = (SELECT MAX(lft) • FROM OrgChart AS S • WHERE P.lft > S.lft • AND P.lft < S.rgt);
Inserting into a Tree • The real trick is numbering the subtree correctly before inserting it. • Basic idea is to spread the nested set numbers apart to make a gap, the size of the subtree then you add the subtree. • The position of the subtree within the siblings of the new parent in the tree is another decision.
Inserting into a Tree • insert a new node, G1, under part G. We can insert one node at a time like this: • BEGIN ATOMIC • DECLARE right_most_sibling INTEGER; • SET right_most_sibling • = (SELECT rgt FROM Frammis WHERE part = 'G'); • UPDATE Frammis • SET lft = CASE WHEN lft > right_most_sibling • THEN lft + 2 ELSE lft END, • rgt = CASE WHEN rgt >= right_most_sibling • THEN rgt + 2 ELSE rgt END • WHERE rgt >= right_most_sibling; • INSERT INTO Frammis (part, qty, wgt, lft, rgt) VALUES ('G1', 3, 4, parent, (parent + 1)); • END;
Creating a Tree • If you want to have all the constraints for a proper hierarchy, then it is complicated. • CREATE TABLE Tree (node_id INTEGER NOT NULL REFERENCES Nodes(node_id), lft INTEGER NOT NULL UNIQUE CHECK(lft > 0), rgt INTEGER NOT NULL UNIQUE CHECK (rgt > 1), UNIQUE (lft, rgt), CHECK (lft < rgt), CHECK( EXISTS (SELECT * FROM Tree AS T1 HAVING MAX(rgt) = 2 * COUNT(*)) CHECK (NOT EXISTS (SELECT * FROM Tree AS T1 WHERE Tree.lft BETWEEN T1.lft AND T1.rgt ))); • You can also declare node_id to be the PRIMARY KEY, but then one person cannot hold two jobs.
Converting an Adjacency Model into a Nested Set Model • Current best method is to load nodes into a tree in a host language, then do a recursive pre-order tree traversal to get the lft and rgt traversal numbers. • Adjacency list method does not order siblings; nested set model does automatically • Classic push down stack algorithm works • You can keep both models in one table with a column for the immediate superior
Structure versus Contents • Nested set model allows the structure of trees to be compared. • For each tree find the lft value of the root node of each tree • Make a canonical form and UNION ALL them • EXISTS ( SELECT * FROM ( SELECT (lft - lftmost), (rgt - lftmost) FROM Tree1 UNION ALL SELECT (lft - lftmost), (rgt - lftmost) FROM Tree2) AS Both (lft, rgt) GROUP BY Both.lft, Both.rgt HAVING COUNT (*) =1 )