110 likes | 191 Views
Graph Algorithms for Irregular, Unstructured Data. John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010. Analytic methods and applications. Semantic Web. FaceBook - 300 M users. Train. Anthrax. Bus. Money. Endo. Hayashi. Zaire.
E N D
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing SoftwarePacific Northwest National Laboratory July, 2010
Analytic methods and applications Semantic Web FaceBook - 300 M users Train Anthrax Bus Money Endo Hayashi Zaire People, Places, & Actions Community Activities Security National Security SmartGrid Blog Analysis Anomaly detection Connect-the-dots N-x contingency analysis Community thought leaders
Data analytics 1000x growthin 3 years! • has more than 300 million active users • Sample queries: • Allegiance switching: identify entities that switch communities. • Community structure: identify the genesis and dissipation of communities • Phase change: identify significant change in the network structure • Traditional graph partitioning often fails: • Topology: Interaction graph is low-diameter and has no good separators • Irregularity: Communities are not uniform in size • Overlap: individuals are members of one or more communities
Graphs are not grids Scientific Grids Graphs for Data Informatics Static or slowly involving Planar Nearest neighbor communication Work performed per cell or node Work modifies local data Dynamic Non-planar Communications are non-local and dynamic Work performed by crawlers or autonomous agents Work modifies data in many places Graphs arising in informatics are very different from the grids used in scientific computing
Small-world and scale-free “Six degrees of separation” Large hubs are in grey • In scale-free graphs • difficult to partition • work concentrates in a few nodes • In low diameter graphs • work explodes • difficult to partition • high percentage of nodes are visited
Graph methods Influential Factors • Degree distribution • Normal • Scale-free • Planar or non-planar • Static or dynamic • Weighted or unweighted • Weight distribution • Typed or untyped edges Load imbalanceNon-planar Difficult to partition Concurrent insertsand deletions • Paths • Shortest path • Betweenness • Min/max flow • Structures • Spanning trees • Connected components • Graph isomorphism • Groups • Matching/Coloring • Partitioning • Equivalence
Challenges • Problem size • Ton of bytes, not ton of flops • Little data locality • Have only parallelism to tolerate latencies • Low computation to communication ratio • Single word access • Threads limited by loads and stores • Frequent synchronization • Node, edge, record • Work tends to be dynamic and imbalanced • Let any processor execute any thread
Grids,Uniform, and Scale-Free Graphs USA Roadmap METIS Partitioner Scale-Free Uniform
System requirements Cray XMT • Global shared memory • No simple data partitions • Local storage for thread private data • Network support for single word accesses • Transfer multiple words when locality exists • Multi-threaded processors • Hide latency with parallelism • Single cycle context switching • Multiple outstanding loads and stores per thread • Full-and-empty bits • Efficient synchronization • Wait in memory • Message driven operations • Dynamic work queues • Hardware support for thread migration
Center for Adaptive Supercomputer Software Sponsored by DOD Driving Development of Next-Generation Massively Multithreading Architectures
Summary • The new HPC is irregular and sparse • There are commercial and consumer applications • If the applications are important enough, machines will be built • HPC is too large and too diverse for “one size fits all” • We need to build the right machines for the problems we have to solve