550 likes | 563 Views
This paper presents a sextic algorithm for website design, focusing on navigation, access to information, and organization. It introduces the concept of Constrained Subtree Selection (CSS) and explores its optimization for creating good websites.
E N D
A sextic algorithm for website design Brent Heeringa (heeringa@cs.umass.edu) (Joint work with Micah Adler) 21 October 2004 Union College
Knives Maker Type Wüstof Henkels paring chef bread steak 0.26 0.33 0.27 0.14 A website design problem(for example: a new kitchen store) • Given products, their popularity, and their organization: • How do we create a good website? • Navigation is natural • Access to information is timely
Transitive Closure Subgraph of TC Good website: Natural Navigation • Organization is a DAG • TC of DAG enumerates all viable categorical relationships and introduces shortcuts • Subgraph of TC preserves logical relationship between categories TC A B C A B C
Good website: Timely Access to Info • Two obstacles to finding info quickly • Time scanning a page for correct link • Time descending the DAG • Associate a cost with each obstacle • Page cost (function of out-degree of node) • Path cost (sum of page costs on path) • Good access structure: • Minimize expected path cost • Optimal subgraph is always a full tree 1/2 Page Cost = # links Path Cost = 3+2=5 Weighted Path Cost = 5/2
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 (x)=x
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) (x)=x Cost:4
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) 5(1/4) (x)=x Cost:4
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) 5(1/4) 5(1/4) (x)=x Cost:4
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 3(1/4) 5(1/4) 5(1/4) 3(1/4) (x)=x Cost:4
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/4 1/4 1/4 1/4 1/4(3+5+5+3) = 1/4(16) = 4 (x)=x Cost:4
Constrained Subtree Selection (CSS) • An instance of CSS is a triple: (G,,w) • G is a rooted, DAG with n leaves (constraint graph) • is a function of the out-degree of each internal node (degree cost) • w is a probability distribution over the n leaves (weights) • A solution is any directed subtree of the transitive closure of G which includes the root and leaves • An optimal solution is one which minimizes the expected path cost A B C D 1/2 1/6 1/6 1/6 (x)=x Cost: 3 1/2
Constraint-Free Graphs and k-favorability • Constraint-Free Graph • Every directed, full tree with n leaves is a subtree of the TC • CSS is no longer constrained by the graph • k-favorable degree cost • Fix . There exists k>1 for any constraint-free instance of CSS under where an optimal tree has maximal out-degree k
Linear Degree Cost - (x)=x • 5 paths w/ cost 5 • 3 paths w/ cost 5 • 2 paths w/ cost 4 • Unweighted path costs are all less, so weighted path costs must all be less • Generalization to n>6 paths is straightforward
Linear Degree Cost - (x)=x • 4 paths w/ cost 4 • 4 paths w/ cost 4
Linear Degree Cost - (x)=x > 1/2 • Prefer binary structure when a leaf has at least • half the mass • Prefer ternary structure when mass is • uniformly distributed • CSS with 2-favorable degree costs and C.F. graphs is Huffman coding problem • Examples: quadratic, exp, ceiling of log
Results • Complexity: NP-Complete for equal weights and many • Sufficient condition on • Hardness depends on constraint graph • Highlighted Algorithm: • Theorem: O(n6)-time DP algorithm • (x)=x and G is constraint free • Other results: • Characterizations of optimal trees for uniform probability distributions • Theorem: poly-time constant-approximation: • ≥1 and k-favorable; G has constant out-degree • Approximate Hotlink Assignment - [Kranakis et. al]
Related Work • Adaptive Websites [Perkowitz & Etzioni] • Challenge to the AI community • Novel views of websites: Page synthesis problem • Hotlink Assignment [Kranakis, Krizanc, Shende, et. al.] • Add 1 hotlink per page to minimize expected distance from root to leaves • Recently: pages have cost proportional to their size • Hotlinks don’t change page cost • Optimal Prefix-Free Codes [Golin & Rote] • Min code for n words with r symbols where symbol ai has cost ci • Resembles CSS without a constraint graph
Dynamic Programming Review • Problems which exhibit: • Optimal substructure • An optimal sol. may be written in terms of opt. solutions to subproblems • Inductive definition • Overlapping subproblems • Different problem instances share subproblems • Repeated computation
Dynamic Programming: Fib 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, … Problem: What is the ith Fibonacci number? • Optimal substructure (inductive definition) • Overlapping subproblems • Fib(7) = Fib(6) + Fib(5) (but Fib(6) calls Fib(5)) • We only need to calculate Fib(5) once • Don’t repeat computations • Idea: Store solutions to subproblems in a table Fib(0) = 0 Fib(1) = 1 Fib(i) = Fib(i-1) + Fib(i-2)
Dynamic Programming: Fib • General Approach • Write inductive definition • Range of parameters in definition defines table size • Fill in table using definition • Analysis: (Table size) * (# of lookups) Fib(0) = 0 Fib(1) = 1 Fib(i) = Fib(i-1) + Fib(i-2) Fib(14) : 0 ≤ i ≤ 14 … Fib(i): 0 1 1 2 3 5 8 144 233 377 i: 0 1 2 3 4 5 6 12 13 14
Dynamic Programming: Subset Sum Subset Sum (SS): Given a set of n positive integers X=(x1,…,xn) and a positive integer T, is there a subset of X which sums to T? • Example: X={2, 3, 5, 9, 10, 15, 17} and T=28
Dynamic Programming: Subset Sum Subset Sum (SS): Given a set of n positive integers X=(x1,…,xn) and a positive integer T, is there a subset of X which sums to T? • Example: X={2, 3, 5, 9, 10, 15, 17} and T=28 • Yes: {2, 9, 17} and {3, 10, 15}
Dynamic Programming: Subset Sum Subset Sum (SS): Given a set of n positive integers X=(x1,…,xn) and a positive integer T, is there a subset of X which sums to T? • Example: X={2, 3, 5, 9, 10, 15, 17} and T=28 • Yes: {2, 9, 17} and {3, 10, 15} • Inductive definition: Let Xi = (x1,…,xi) = the first i integers of X SS(t,i) = TRUE if there is a subset of Xi which sums to t = FALSE, otherwise
Dynamic Programming Review The ith element is in the subset SS(0,i) = TRUE SS(t,0) = FALSE SS(t,i) = SS(t-xi,i-1) OR SS(t,i-1) The ith element is not in the subset T … Parameter Range: 0 ≤ t ≤ T 0 ≤ I ≤ n … … n (t,i) … • Table Size: T*n • Each cell – (t,i) – depends on 2 other cells • O(Tn) time for SS
Recall: (x)=x (3-favorable) and G is constraint free Node level = path cost Adding an edge increases level Grow lopsided trees level by level Lopsided Trees
Lopsided Trees • We know exact cost of tree up to the current level i: • Exact cost of m leaves • Remaining n-m leaves must have path-cost at least i
Lopsided Trees: Cost • Exact cost of C: 3 • (1/3)=1 • Remaining mass up to level 4: (2/3) • 4 = 8/3 • Total: 1+8/3=11/3
Lopsided Trees: Cost • Tree cost at Level 5 in terms of Tree cost at Level 4: • Add in the mass of remaining leaves • Cost at Level 5: • No new leaves • 11/3+2/3=13/3 • Cost updates don’t depend on level
Lopsided Trees • Equality on trees: • Equal number of leaves at or above frontier • Equal number of leaves at each relative level below frontier • Nodes have outdegree ≤ 3 • Node below frontier ≤ (3)=3 • (m;l1, l2, l3) = signature • Example Signature: (2; 3, 2, 0) • 2: C and F are leaves • 3: G, H, I are 1 level past the frontier • 2: J and K are 2 levels past the frontier • Signature if F is interior node with 3 children?
Inductive Definition • Let CSS(m,l1,l2,l3) = min cost tree with sig (m;l1, l2, l3) • Can we define CSS(m,l1,l2,l3) in terms of optimal solutions to subproblems? • Which trees, when grown by one level, have sig (m;l1,l2,l3)? • Which parent sigs (m’;l’1,l’2,l’3) lead to the child sigs (m;l1,l2,l3)
Different Signatures (2; 2, 0, 0) (0; 4, 0, 0)
Same Signature (2; 0, 2, 3) Different signatures lead to (2; 0, 2, 3)
The other direction(which signatures can a tree grow) Sig: (0; 2, 0, 0) • Growing a tree only affects frontier • Only l1 affects next level • Choose # of leaves • The remaining nodes are internal • Choose degree-2 (d2) • Remaining nodes are degree-3 (d3) • O(n2) choices Sig: (1; 0, 0, 3)
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) PARENT CHILD
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 interior nodes in (m,l1,l2,l3)) • Let’s determine the values of the remaining variables 1 1 2 2 3 d2 nodes l’1 nodes 3
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The old number of leaves Internal nodes of degree 2 1 2 m = m’ + l’1- d2 - d3 3 Nodes at one level below the frontier Internal nodes of degree 3 The new number of leaves
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The old number of leaves Internal nodes of degree 2 1 m = m’ + l’1- d2 - l3/3 2 3 Nodes at one level below the frontier Internal nodes of degree 3 The new number of leaves
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The old number of nodes at 2 levels below the frontier New nodes one level below the frontier l’2 = l1
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • Suppose we know • l’1 (the # of nodes one level below the frontier) • d2 (the # of l’1 which are degree-2 nodes in (m,l1,l2,l3)) The new number of nodes 2 levels below the frontier d2 nodes are binary so they contribute 2d2 to the frontier l2 = l3+2d2
The original question(warning: here be symbols) • Which (m’;l’1,l’2,l’3) (m;l1,l2,l3) • l’1 and d2 are sufficient • l’1 and d2 are both O(n) • O(n2) possibilities for (m’;l’1,l’2,l’3) • CSS(m,l1,l2,l3) = min cost tree with sig. (m;l1, l2, l3) = CSS(m’,l’1,l’2,l’3) + cm’ for 1≤d2≤l’1≤n (cm’ are the smallest n-m’ weights) • CSS(n,0,0,0) = cost of optimal tree • Analysis: • Table size = O(n4) • Each cell takes O(n2) lookups • O(n6) algorithm
Some Observations • Generalize algorithm: • Theorem: O(n(k)+k)-time DP algorithm • is positive, integer-valued, non-decreasing, k-favorable and G is constraint free • Signatures = (k)+1 vectors • Table size = (k)+1 • Each cell requires k-1 lookups