Operating System Support for Fine-Grain Parallelism on Multicore Architectures

Operating System Support forFine-Grain Parallelism on Multicore Architectures John Giacomoni Manish Vachharajani University of Colorado at Boulder 2007.10.14

Problem • UP performance at “end of life” • Chip-Multiprocessor systems • What do we want from multicore systems? • Individual cores less powerful than UP • Asymmetric and Heterogeneous • 10s-100s-1000s of cores Performance! Intel (2x2-core) MIT RAW (16-core) 100-core 400-core

ExtractingPerformance • Task Parallelism • Desktop • Data Parallelism • Web serving • Split/Join, MapReduce, etc… • Pipeline Parallelism • Video decoding • Network processing

ExtractingPerformance (2) • Stream Parallelism • Combines • Data Parallelism • Pipeline Parallelism • Ad-Hoc Parallelism • Semi- or unstructured • Usual thread model

Focus onPipeline Parallelism • Most stringent timing requirements • Example applications: • Network Processing • Network Intrusion Detection • DDoS Filtering • Multimedia processing • Transcoding • Signal Processing • Software Defined Radio • Also applies to • Data parallelism • Stream Parallelism

Soft Network Processing(Soft-NP) • How do we protect? • GigE Network Properties: • 1,488,095 frames/sec • 672 ns/frame • Frame dependencies “Frame Shared Memory: Line-Rate Networking on Commodity Hardware”. To Appear: Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communications Systems 2007 (ANCS), December 2007. John Giacomoni, John K. Bennett, Antonio Carzaniga, Douglas C. Sicker, Manish Vachharajani and Alexander L. Wolf.

Frame Shared Memory(Soft-NP) Input (IP) Output(OP)

What OS support is necessary?

Low-OverheadCommunication Gigabit Ethernet Syscalls ~170ns pthread mutex ~200ns

FastForward • Portable software only framework • ~35-40ns/queue operation 2.0 GHz AMD Opteron • ~26-28ns/queue operation 2.6 GHz AMD Opteron • Architecturally tuned CLF queues • Works with strong to weak consistency models • Hides die-die communication • Robust against unbalanced stages • Poster: “FastForward for Efficient Pipeline Parallelism”. Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2007. John Giacomoni, Tipp Moseley, Manish Vachharajani.

FastForwardPerformance Lamport FF FF Unbalanced FF Re-Balanced

Zero-StallGuarantee

GangScheduling • Optimize for application performance • Instead of system throughput or fairness • Computer Utility -> max(System Utilization) • Multicore system -> excess of resources. • Dedicate resources to pipeline applications • Want selective timesharing

SystemServices • Fast! • Synchronous calls introduce too much overhead • System calls ~ 170ns • Asynchronous calls may limit parallelism • Want: System services with independent I/O paths

PipelinableSystem Services • Mixing stages from multiple process domains • Push model vs. call/return or poll • Hardware can be an active participant

HeterogeneousGang Scheduling • Need a single scheduling label for every pipeline stage • Ensures simultaneous scheduling of every necessary resource • (zero-stall guarantee) • Including hardware stages. • Scheduling multi-domain entities

Multi-DomainEntities • Application state • Shared with local stages • Pipeline private state • Stage state shared with pipeline and parent process. • The multi-domain application model respects the private data model implicit in single-domain applications while providing first-class naming for multi-domain pipelines.

Summaryof Discussion • Low-overhead communication • Zero-stall guarantee • Selective timesharing • Pipelineable system services • Heterogenous gang scheduling • Pipelines as multi-domain applications

Questions? john.giacomoni@colorado.edu

Operating System Support for Fine-Grain Parallelism on Multicore Architectures

Operating System Support for Fine-Grain Parallelism on Multicore Architectures

Presentation Transcript

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures

Multicore and Parallelism

Operating System Architectures

Massively LDPC Decoding on Multicore Architectures

Fine Grain MPI

Fine-Grain Communication

Multicore, parallelism, and multithreading

Fine-Grain Parallelism

Operating System support

Multicore and Parallelism

FOS (Factored Operating System) An Operating System for Multicore and Clouds

Secure Operating System Architectures Patterns

The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism

Coarse and Fine Grain Programmable Overlay Architectures for FPGAs

Parallel Skyline Computation on Multicore Architectures

Operating System Architectures

Multicore and Parallelism

Operating System Architectures

Software Enablement for Multicore Architectures