450 likes | 911 Views
Chronos: A Graph Engine for Temporal Graph Analysis. Wentao Han 1,3 , Youshan Miao 2,3 , Kaiwei Li 1,3 , Ming Wu 3 , Fan Yang 3 , Lidong Zhou 3 , Vijayan Prabhakaran 3 , Wenguang Chen 1 , Enhong Chen 2
E N D
Chronos: A Graph Engine for Temporal Graph Analysis Wentao Han1,3, Youshan Miao2,3, Kaiwei Li1,3, Ming Wu3, Fan Yang3, Lidong Zhou3, Vijayan Prabhakaran3, Wenguang Chen1, Enhong Chen2 Tsinghua University1University of Science and Technology of China2Microsoft Research3
Temporal Graphs • Real-world graphs evolve – temporal graphs • Temporal graph properties bring more insights A Social Graph 2012 2014 2013 YEAR
Temporal Graphs • Real-world graphs evolve – temporal graphs • Temporal graph properties bring more insights A Social Graph 2012 2014 2013 YEAR Temporal ranks can tell their differences
Temporal Graph Analysis Computing properties on a series of graph snapshots t0 t1 t2 Graph snapshot 2012 2014 2013 YEAR Static Graph Analysis Graph Properties
Temporal Graph Analysis • Existing graph engines: targeting static graph analysis • A possible solution: computing snapshot by snapshot 2012 2014 2013 YEAR Task 1 Task 2 Task 3
Revisit: Static Graph Analysis v2 v1 Propagation based graph computation model Vertex Data Array Data Propagation v3 v5 Local computation Edge Array scan
Revisit: Static Graph Analysis v2 v1 Propagation based graph computation model Vertex Data Array Cache Miss Data Propagation v3 v5 Local computation Edge Array scan
Revisit: Static Graph Analysis v2 v1 In parallel: Partition graph & computations among CPU cores Vertex Data Array Cross-partition edge Inter-core Communication v3 v5 Core 0 Core 1 Edge Array scan
Temporal Graph Analysis: Snapshot by Snapshot Computation on multiple graph snapshot – multiple cost Vertex Data Arrays Snapshot 1 Snapshot 2 Snapshot 3 • N snapshots • N cache misses • N inter-core comm.
Observations Real-world graph often evolve gradually (Similar snapshots) v1 v2 v1 v2 v1 v2 " " ' ' v4 v4 v4 ' " v3 v5 v3 v5 v3 v5 ' ' " " Snapshot 1 Snapshot 2 Snapshot 3
Observations Similar propagations across snapshots v1 v2 v1 v2 v1 v2 " " ' ' v4 v4 v4 " ' v3 v5 v3 v5 v3 v5 ' ' " " Snapshot 1 Snapshot 2 Snapshot 3
Idea Group propagations by source & target, not by snapshot v1 v2 v1 v2 v1 v2 " " ' ' v4 v4 v4 " ' v3 v5 v3 v5 v3 v5 ' ' " " Snapshot 1 Snapshot 2 Snapshot 3 Step 2 Step 1 Step 3 Step 1 Step 2 Step 4 Step 3 Propagations: 1 3 1 4 1 5 1 2
Chronos: Data Layout • Place together data for the same vertex across multiple snapshots Vertex Data Arrays (snapshot-by-snapshot) Snapshot 1 Snapshot 2 Snapshot 3 Vertex Data Array (Chronos) (with time-locality) Snapshot 1, 2, 3 fit in a cache line
Chronos: Propagation Scheduling • Locality Aware Batch Scheduling (LABS): • Batching propagating across snapshots vertex 1 -> vertex 3 across snapshots vertex 1 -> vertex 2 across snapshots Edge Array scan Vertex Data Array fit in a cache line
Chronos: Propagation Scheduling • Locality Aware Batch Scheduling (LABS): • Batching propagating across snapshots Edge Array scan Vertex Data Array • N propagations • 1 cache misses Cache Hit fit in a cache line
Chronos: Propagation Scheduling • Locality Aware Batch Scheduling (LABS): • Batching propagating across snapshots Edge Array scan Vertex Data Array • N propagations • 1 inter-core comm. Inter-core Communication access in a batch
LABS: The Key of Chronos • A graph layout • Place together vertex/edge data across snapshots • A scheduling mechanism • Batch propagations across snapshots • Efficient • Reduced cache miss / inter-core comm.
Experimental Evaluation • Large temporal graphs • Various graph algorithms • PageRank • Weakly-connected components (WCC) • Single-source shortest path (SSSP) • Maximal independent set (MIS) • Sparse matrix-vector multiplication (SpMV) • Settings
Chronos: Single-Thread Effectiveness • 5~9x speedup 1 • Baseline: Snapshot by snapshot
Chronos: Single-Thread Effectiveness 92% 70% 95% Reduced cache misses
Chronos: Multi-Core Performance 10x 1 More than to 10x faster
Chronos: Multi-Core Performance 98% 98% 98% Reduced inter-core comm.
More in Paper: • Graph computation modes • All benefit from LABS Push Mode Pull Mode Stream Mode
More in Paper: • Incremental graph computation • Leveraging the previous snapshot’s result • Computing only the changed part • Can be enhanced with LABS
Conclusion • Temporal graph analysis • an emerging class of applications • Chronos • supports analysis of temporal graphs efficiently • Joint design of data layout and scheduling • Leveraging the temporal similarity of graphs • Exploit data locality esp. in time dimension
Thank You! Questions? Tsinghua University University of Science and Technology of China Microsoft Research
BACKUP • Experiment Environment Details • Real Graphs Similarities over Time • Batch Size Discussion • LABS Locking • LABS with Incremental Computation • LABS on Cluster • Related Work
Experiment Setup 1. SSD model: TOSHIBA MK4001GRZB
Temporal Distributions of Graphs • Edges increase gradually
On-disk Temporal Graph Snapshot Groups A Snapshot Group Ci: checkpoint of vi: Edges without time information aij: j-th activity of vi: Edge changes, e.g., <addE, (v0, v3, w), t2 >
LABS: In-memory Design Vertex Data Array Logically Equals to: indicate which snapshots the edge exists in Edge Array
Temporal Graph Re-construction • User input time points: 0, 10, 20 • Scan the graph activity log [Type, Endpoints, Time]: addE, v0->v1, 0 addE, v0->v2, 15 addE, v0->v3, 6 delE, v0->v3, 8 • Temporal edges [Endpoints, BitSet]: v0->v1, 111 v0->v2, 001
Chronos System Overview On-Disk Temporal Graph Contains all the graph evolving activities User input multiple time points Scan activities(log) Reconstruct graph snapshots In-Memory Temporal Graph Contains only snapshots of interest Temporal Properties
Greater Batch Size of LABS • Pros • Possible to further reduce cache miss / inter-core comm. • Cons • Bit wide limit of the instruction: _BitScanForward64 • Less snapshot similarity within a batch • No more cache miss / inter-core comm. to reduce • False sharing with locking
Compute Snapshot by Snapshot (another way) • Snapshot-Parallelism Vertex Data Array Snapshot 1 Core 0 Snapshot 2 Core 1 Snapshot 3 Core 2 • 3 cache misses • 3 inter-core comm. Cache Miss Inter-core communication
Parallelization -- Summary Good partitioning: Num. of intra-partition edge > Num. of inter-partition edge ? Snapshot by snapshot LABS Partition-Parallelism: Computing partitions of the same snapshot in parallel Snapshot-Parallelism: Computing snapshots in parallel LABS-Parallel: Computing LABS-batched partitionin parallel
LABS Performance on Multi-Core 1 • Baseline: Single Core LABS-Parallelism out-performs
LABS Performance on Cluster • A small cluster with 4 machines • Benefit less than in single machine test • The benefit of LABS hided by the high overhead of network Up to 10x speed up
Reduced Lock Contentions • LABS amortizes the lock cost across snapshots • PageRank on the Wiki graph 96% 96% 96% 95% Reduced the time of locking by more than 95%
LABS with Incremental Computation Incremental Computing • Traditional incremental computing • Incremental computing with LABS Snapshot 1 Snapshot 0 Snapshot 2 Snapshot 3 Apply LABS (BatchSize = 3) Snapshot 1 Snapshot 0 Snapshot 2 Snapshot 3
Gain of Incremental LABS Baseline: Traditional Incremental
Related work • Existing Graph Engines – static graph engines • Pregel(SIGMOD’10) • Powergraph (OSDI’12) • GraphLab (VLDB’12) • Grace (ATC’12) • X-stream (SOSP’13) • … • Active studies on changes and new concepts in evolving graph • Densification law, “Shrinking diameters” diameter (KDD’05) • PageRank (CIKM’07), Facebook user activities (EuroSys’09), centrality in evolving graph (MLG’10), retweet after N friends’ retweets (WWW’11), Rumors detection (SOMA’10)…