Scalable Algorithms for Massive-scale Graphs

Fabrizio Petrini, IBM TJ Watson Scalable Algorithms for Massive-scale Graphs KAUST, SC11 November 2011

Linkedin Network

Data in real-world: Large Complex Networks Internet : Nodes are the routers, edges are the connections. Finding network topology, solving bandwidth, flow and shortest paths problems.(1.8 billion users of the internet and growing) Network Security: Large graphs that can be use to dynamically track network behavior and identify potential attacks Social Networks Internet Finance Biological Transportation 1Data source: DIMACS, Wikipedia

Data in real-world: Large Complex Networks Social networks: Nodes represent people, joined through personal or professional acquaintance, or through groups and communities. Social Networks Facebook : 100 to 150 million users in 4 months, >400 million currently Twitter : 75 million users (Jan 2010) 6.2 million new users per month Internet Finance Biological Transportation 1Data source: Nielson Report, The Inquirer

Blue Gene/Q 4. Node Card: 32 Compute Cards, Optical Modules, Link Chips; 5D Torus 3. Compute card: One chip module, 8/16 GB DDR3 Memory, Heat Spreader for H2O Cooling 2. Single Chip Module 1. Chip: 16+2 P cores 5b. IO drawer: 8 IO cards w/16 GB 8 PCIe Gen2 x8 slots 3D I/O torus 7. System: 96 racks, 20PF/s 5a. Midplane: 16 Node Cards • Sustained single node perf: 10x P, 20x L • MF/Watt: (6x) P, (10x) L (~2GF/W, Green 500 criteria) • Software and hardware support for programming models for exploitation of node hardware concurrency 6. Rack: 2 Midplanes

Message Latency Message Rate Single threaded 134 1184 pclk 1150 pclk 779 pclk 245 530 pclk Our Graph500 implementation relies on SPI Total MPI Latency 2864pclk Total MPI Overhead 1172pclk

L1 PF PPC PPC PPC PPC 2MB L2 2MB L2 2MB L2 2MB L2 FPU FPU FPU FPU L1 PF PPC PPC PPC PPC 2MB L2 2MB L2 2MB L2 FPU FPU FPU FPU L1 PF PPC PPC PPC PPC 2MB L2 2MB L2 2MB L2 FPU FPU FPU FPU L1 PF PPC PPC PPC PPC 2MB L2 2MB L2 2MB L2 FPU FPU FPU FPU L1 PF L1 PF L1 PF L1 PF L1 PF L1 PF L1 PF L1 PF L1 PF L1 PF 2MB L2 L1 PF 2MB L2 L1 PF 2MB L2 L1 PF PPC FPU External DDR3 Blue Gene/Q chip architecture DDR-3 Controller • 16+1 core SMP Each core 4-way hardware threaded • Transactional memory and thread level speculation • Quad floating point unit on each core 204.8 GF peak node • Frequency: 1.6 GHz • 563 GB/s bisection bandwidth to shared L2 (Blue Gene/L at LLNL has 700 GB/s for system) • 32 MB shared L2 cache • 42.6 GB/s DDR3 bandwidth (1.333 GHz DDR3) (2 channels each with chip kill protection) • 10 intra-rack interprocessor links each at 2.0GB/s • one I/O link at 2.0 GB/s • 16 GB memory/node • 55 watts chip power DDR-3 Controller External DDR3 full crossbar switch PPC L1 PF 2 GB/s I/O link (to I/O subsystem) Network FPU dma Test 10*2GB/s intra-rack & inter-rack (5-D torus) Blue Gene/Q compute chip PCI_Express note: chip I/O shares function with PCI_Express IBM Confidential

PPC PPC PPC PPC PPC PPC PPC PPC PPC PPC PPC L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 L1 PF PF PF PF PF PF PF PF PF PF PF FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU FPU switch 2MB L2 2MB L2 2MB L2 2MB L2 Scalable Atomic Operation(fetch_and_inc for example – queuing lock) • 1 round trip + 4 L2 cycles • Where N is the number of threads • For N=64 and L2 75 cycles 331 cycles • Compared to 9600 cycles for standard 1 1.2 2MB L2 2MB L2 PPC L1 PF FPU PPC L1 PF PPC L1 PF FPU FPU PPC L1 PF FPU 1.3 1.1 2MB L2 2MB L2 PPC L1 PF PPC L1 PF FPU FPU PPC L1 PF PPC L1 PF FPU FPU PPC L1 PF FPU IBM Confidential

Scalable Algorithms for Massive-scale Graphs

Scalable Algorithms for Massive-scale Graphs

Presentation Transcript

Graphs: Structures and Algorithms

Advanced Algorithms for Massive DataSets

Scalable Algorithms for Structured Adaptive Mesh Refinement

More Algorithms for Trees and Graphs

Algorithms and Tools for Scalable Graph Analytics

Advanced Algorithms for Massive Datasets

GRAIL: Scalable Reachability Index for Large Graphs

Better Scalable Algorithms for Broadcast Scheduling

Advanced Algorithms for Massive Datasets

Massive Scale Deployments

Algorithms for massive data sets

Graphs Algorithms

Approximation algorithms for geometric intersection graphs

Modeling massive dynamic graphs

MSSG: A Framework for Massive-Scale Semantic Graphs

Algorithms for Extracting Timeliness Graphs

Fast Algorithms for Analyzing Massive Data

Approximation algorithms for geometric intersection graphs

Algorithms for Extracting Timeliness Graphs

Highly Scalable Algorithms for Robust String Barcoding

Scalable Algorithms for Association Mining

Algorithms on graphs