200 likes | 288 Views
Finding Skyline Nodes in Large Networks. Arijit Khan * Vishwakarma Singh * Jian Wu # *Computer Science, University of California, Santa Barbara, USA # College of Computer Science, Zhejiang University, China { arijitkhan , vsingh }@ cs.ucsb.edu , wujian2000@zju.edu.cn.
E N D
Finding Skyline Nodes in Large Networks Arijit Khan* Vishwakarma Singh* Jian Wu# *Computer Science, University of California, Santa Barbara, USA #College of Computer Science, Zhejiang University, China {arijitkhan, vsingh}@cs.ucsb.edu, wujian2000@zju.edu.cn
Motivation Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? • Evaluation Metrics: • Distance from the query node. (John) • Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce) Finding Skyline Nodes in Large Networks 2
Homogeneous Approach ? Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? Score = λ . Distance + (1- λ ). Coverage How to get λ ? Finding Skyline Nodes in Large Networks 3
Weighted Set Cover ? • Find nodes with smallest aggregate distance from the query node, such that they cover all query topics. u0 = q Q = { a, b, c } • Ignore some interesting nodes. • Cannot rank the results. a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Finding Skyline Nodes in Large Networks 4
Graph Skyline • Dominance on Coverage: u >c v • Query topics covered by node u is a superset of the query topics covered by node v. • Dominance on Distance: u >d v • Distance of u from q is less than that of v from q. • Dominance: u > v • (1) u >c v and u ≥d v ; • or (2) u ≥c v and u >d v. u0 = q Q = { a, b, c } a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Graph Skyline: A node is a skyline node if it is not dominated by any other node in the network. Finding Skyline Nodes in Large Networks 5
Ranking of Skyline Nodes • Too many skyline nodes. • Rank them. u0 = q Q = { a, b, c } • Dominance Count: # nodes dominated by a skyline node. [Lin et. al., ICDE ‘07] • Higher Dominance Count => more pruning from candidate set. a b c u1 u2 u3 abc a cd u5 u4 u6 • 1. DC(u4) = {u5, u6, u7}, • 2. DC(u1) = {u5} • 3. DC(u2) = Φ; 4. DC(u3) = Φ abc de u7 u8 Problem Statement: Given a query node and a set of query topics in a network, find the top-k skyline nodes with maximum dominance count. Finding Skyline Nodes in Large Networks 6
Algorithm • Construct a Query DAG. • Three variables associated with each DAG node: Count (C), Dominance • (D), Traversal (T). u0 = q • Naïve Complexity: O(n2r) • Complexity with • Preprocessing: O(nr2) Q = { a, b, c } C = 2 D = - T = - abc a b c C = 0 D = - T = - ab ac bc C = 0 D = - T = - u1 u2 u3 C = 0 D = - T = - abc a cd u5 u4 u6 C = 1 D = - T = - C = 2 D = - T = - C = 2 D = - T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 7
Query DAG Construction • Preprocessing: For each label, find a sorted list of nodes that contain the label. • Online Query DAG Construction: Incremental DAG construction. u0 = q Q = { a, b, c } u4 u7 u3 u4 u6 u7 a b c c ab u1 u2 u3 abc a cd a b u5 u4 u6 abc de u1 u5 u2 u7 u8 Finding Skyline Nodes in Large Networks 8
Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c ab u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 9
Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c bc ab ac u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 10
Find Dominance Variable • Perform a topological ordering of the DAG nodes to evaluate the Dominance variable (D) of each DAG node. • # Nodes dominated (or equal) by coverage. u0 = q • Naïve Complexity: O(n2r) • Complexity by • Topological Ordering: O(3r) Q = { a, b, c } C = 2 D = 7 T = - abc a b c C = 0 D = 4 T = - ab ac bc C = 0 D = 3 T = - u1 u2 u3 C = 0 D = 3 T = - abc a cd u5 u4 u6 C = 1 D = 1 T = - C = 2 D = 2 T = - C = 2 D = 2 T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 11
Find Traversal Variable • Perform a Breadth First Search (BFS) starting from the query node. • # Nodes not dominated by distance. u0 = q C = 2 D = 7 T = 1 • Complexity by BFS: O(n+e) Q = { a, b, c } abc a b c C = 0 D = 4 T = 0 ab ac bc C = 0 D = 3 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 abc a cd u4 u6 u5 h =2 C = 1 D = 1 T = 1 C = 2 D = 2 T = 2 C = 2 D = 2 T = 2 abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 12
Find Skyline Nodes • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c a b c ab ac bc u1 u2 u3 h =1 abc a cd u4 u5 u6 abc de a b c u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 13
Find Skyline Nodes (cont.) • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c ab ac bc u1 u2 u3 abc a cd u4 u5 u6 abc de a b c h =2 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 14
Dominance Count of Skyline Nodes • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. C = 2 D = 7 T = 0 u0 = q Q = { a, b, c } abc a b c ab ac bc C = 0 D = 4 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 C = 0 D = 3 T = 0 abc a cd u4 u5 u6 h =2 C = 2 D = 2 T = 1 C = 1 D = 1 T = 1 abc de a b c C = 2 D = 2 T = 1 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 15
Pruning and Early Termination • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Top-k Pruning: Dominance Variable of a DAG node has smaller value than the smallest Dominance Count in the top-k buffer. • Early Termination: Skyline Bits of all entries in the Lookup Table are 1’s. Finding Skyline Nodes in Large Networks 16
Experimental Results • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 0.7M Nodes, 3M Edges, 10 Node Labels (distinct). • 5 Query Topics. Finding Skyline Nodes in Large Networks 17
Efficiency • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 185M Nodes, 90M Edges, 1000 Node Labels (distinct). • 5 Query Topics, Top-5 Result Nodes. Finding Skyline Nodes in Large Networks 18
Conclusion and Future Works • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Efficient Algorithm to find top-k skyline nodes in large attributed network. • Required experimental evaluation in real and synthetic datasets. • Time Complexity is linear in the number of nodes and edges in the network. Distance based indexing might improve the efficiency. • Top-k Skyline set instead of Top-k Skyline nodes might be more effective. Finding Skyline Nodes in Large Networks 19
Questions • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. Thank You ! ! ! Finding Skyline Nodes in Large Networks 20