650 likes | 744 Views
Algorithm Engineering „Parallele Algorithmen“. Stefan Edelkamp. Übersicht. Parallele Externe Suche Parallele Verspätete Duplikatselimination Parallele Expansion Verteilte Sortierung Parallele Strukturierte Duplikatselimination Disjunkte Duplikatserkennungsbereiche ”Schlöser”
E N D
Algorithm Engineering„Parallele Algorithmen“ Stefan Edelkamp
Übersicht • Parallele Externe Suche • Parallele Verspätete Duplikatselimination • Parallele Expansion • Verteilte Sortierung • Parallele Strukturierte Duplikatselimination • Disjunkte Duplikatserkennungsbereiche • ”Schlöser” • Parallele Algorithmen • Matrix-Multiplikation • List Ranking • Euler Tour
VerteilteSuche • Distributed setting provides more space. • Experiments show that internal time dominates I/O.
Exploiting Independence • Since each state in a Bucket is independent of the other – they can be expanded in parallel. • Duplicates removal can be distributed on different processors. • Bulk (Streamed) transfersmuch better than single ones.
Beware of the Mutual Exclusion Problem!!! Distributed Queue for Parallel Best-First Search P0 <g, h, start byte, size> <15,34, 20, 100> TOP P1 <15,34, 0, 100> <15,34, 40, 100> P2 <15,34, 60, 100>
h0 ….. hk-1 hk ….. hl-1 Multiple Processors - Multiple Disks Variant P1 P3 P4 P2 Sorted buffers w.r.t the hash val Sorted Files Divide w.r.t the hash ranges Sorted buffers from every processor Sorted File
B0 B0 B1 B1 B2 B2 B3 B3 B5 B5 B6 B6 B7 B7 B4 B4 B8 B8 B9 B9 B10 B10 B11 B11 B12 B12 B13 B13 B14 B14 B15 B15 Distributed Heuristic Evaluation • Assumeonechildprocessorforeachtileonemasterprocessor
Distributed Pattern Database Search • Onlypatterndatabasesthatincludetheclienttileneedtobeloaded on theclient • Because multiple tiles in pattern, frombirdseye PDB loaded multiple times • In 15-Puzzle withcornerandfringe PDB thissaves RAM in the order offactor 2 on eachmachine, comparedtoloading all • In 36-Puzzle with 6-tile patterndatabasesthissaves RAM in the order offactor 6 on eachmachine, comparedtoloading all • Extendsto additive patterndatabases
D D D D C B A vs. External memory Internal memory Bottleneck: Duplicate detection B A • Duplicate paths cause parallelization overhead C D Same bottleneck in external-memory search slow fast
B4 B1 B14 B7 B11 B2 B0 B0 B15 B3 B3 B12 B15 B12 B8 B13 Disjoint duplicate-detection scopes B0 B1 B1 B2 B3 B2 B5 B6 B7 B7 B4 B4 B8 B9 B10 B11 B11 B8 B12 B13 B13 B14 B15 B14
B6 B1 B4 B1 B9 B4 0 2 B7 B14 B2 B11 0 B5 B3 B0 0 0 0 B12 B15 0 B8 B13 Finding disjoint duplicate-detection scopes 1 2 3 0 0 1 0 0 1 1 2 2 3 1 0 2 3 0 2 0 2 0 1 0 1 2 2 2 1 2 0 3 0 0 2 0 1 3 4 2 1 1 3 0 2 0 0 0 1 2
B0 B1 B2 B3 B5 B6 B7 B4 B8 B9 B10 B11 B12 B13 B14 B15 Implementation of Parallel SDD BenötigtnureinMutex “Schloss” • Hierarchical organization of hash tables • One hash table for each abstract node • Top-level hash func. = state-space projection func. • Shared-memory management • Minimum memory-allocation size m • Memory wasted is bounded by O(m#processors) • External-memory version • I/O-efficient order of node expansions • I/O-efficient replacement strategy