350 likes | 465 Views
Algorithm Engineering „Parallele Suche“. Stefan Edelkamp. Übersicht. Motivation PRAM Terminierung Depth-Slicing Hash- based Partitioning & Transposition Table Scheduling Stack Splitting & Parallel Window Search Parallele Suche mit Treaps. Parallel Shared Memory Graph Search.
E N D
Algorithm Engineering„Parallele Suche“ Stefan Edelkamp
Übersicht • Motivation • PRAM • Terminierung • Depth-Slicing • Hash-basedPartitioning & Transposition Table Scheduling • Stack Splitting & Parallel WindowSearch • Parallele Suche mit Treaps
Parallel Shared Memory Graph Search Single-core CPU Multi-core CPU • Parallelization is important for multi-core CPUs • But parallelizing graph-search algorithms such as breadth-first search, Dijkstra’s algorithm, and A* is challenging… • Issues: Load balancing, Locking, …
Parallel Shared Memory Graph Search Single-core CPU Multi-core GPU • Parallelization is even more important for GPUs • But parallelizing graph-search algorithms such as breadth-first search, Dijkstra’s algorithm, and A* is challenging… • Issues: Kernel Function Design, Load balancing, Locking, …
Parallel External Memory Graph Search Single-core CPU+HDD Multi-core C/GPU+HDD • …
Motivation Parallel andExternal Memory Graph SearchSynergies: • Theyneedpartitionedaccesstolarge setsofdata • Thisdataneedstobeprocessedindividually. • Limited informationtransferbetweentwopartitions • Streaming in externalmemoryprogramsrelatestoCommunication Queues in distributedprograms (ascommunicationoftenrealized on files) • Goodexternalimplementationsoftenleadtogoodparallel implementations
Parallel Random Access MachineCommon Read/Exclusive Write (CREW PRAM)
Definitionen • Problemgröße • Parallele Rechenzeit • Arbeit • Sequentielle Zeit: • Effizienz: • Speedup: Im Beispiel • Linear Speedup • Effiziente Parallelisierung: • Im Beispiel
Einsatz • Using a treap the need for exclusive locks can be alleviated to some extend. • Each operation on the treap manipulates the data structure in the same top-down direction. • Moreover, it can be decomposed into successive elementary operations. Tree partial locking protocol: Every process holds exclusive access to a sliding window of nodes in the tree. It can move this window down a path in the tree, which allows other processes to access different, non-overlapping windows at the same time. • Parallel search using a treap with partial locking has been tested for the FIFTEENPUZZLE on different architectures, with a speedup for 8 processors in between 2 and 5.
Selbstanordnende Bäume mittelsSplay-Operation • Siehe Extra-Folien
Parallel External-Memory Graph Search • Motivation Shared and Distributed Environments • Parallel Delayed Duplicate Detection • Parallel Expansion • Distributed Sorting • Parallel Structured Duplicate Detection • Finding Disjoint Duplicate Detection Scopes • Locking
Distributed Search over the Network • Distributed setting provides more space. • Experiments show that internal time dominates I/O.
Exploiting Independence • Since each state in a Bucket is independent of the other – they can be expanded in parallel. • Duplicates removal can be distributed on different processors. • Bulk (Streamed) transfersmuch better than single ones.
Parallel Breadth-First FrontierSearchEnumerating 15-Puzzle • Hash function partitions both layers into files. • If a layer is done, children files are renamed into parent files. • For parallel processing a work queue contains parent files waiting to be expanded, and child files waiting to be merged
Beware of the Mutual Exclusion Problem!!! Distributed Queue for Parallel Best-First Search P0 <g, h, start byte, size> <15,34, 20, 100> TOP P1 <15,34, 0, 100> <15,34, 40, 100> P2 <15,34, 60, 100>
Distributed Delayed Duplicate Detection Single Files • Each state can appear several times in a bucket. • A bucket has to be searched completely for the duplicates. GOAL Sorted buffers P0 P1 P2 P3 Problem: Concurrent Writes !!!!
h0 ….. hk-1 hk ….. hl-1 Multiple Processors - Multiple Disks Variant P1 P3 P4 P2 Sorted buffers w.r.t the hash val Sorted Files Divide w.r.t the hash ranges Sorted buffers from every processor Sorted File