1 / 19

Operating System Support for Fine-Grain Parallelism on Multicore Architectures

Operating System Support for Fine-Grain Parallelism on Multicore Architectures. John Giacomoni. Manish Vachharajani University of Colorado at Boulder 2007.10.14. Problem. UP performance at “end of life” Chip-Multiprocessor systems What do we want from multicore systems?.

belle
Download Presentation

Operating System Support for Fine-Grain Parallelism on Multicore Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Operating System Support forFine-Grain Parallelism on Multicore Architectures John Giacomoni Manish Vachharajani University of Colorado at Boulder 2007.10.14

  2. Problem • UP performance at “end of life” • Chip-Multiprocessor systems • What do we want from multicore systems? • Individual cores less powerful than UP • Asymmetric and Heterogeneous • 10s-100s-1000s of cores Performance! Intel (2x2-core) MIT RAW (16-core) 100-core 400-core

  3. ExtractingPerformance • Task Parallelism • Desktop • Data Parallelism • Web serving • Split/Join, MapReduce, etc… • Pipeline Parallelism • Video decoding • Network processing

  4. ExtractingPerformance (2) • Stream Parallelism • Combines • Data Parallelism • Pipeline Parallelism • Ad-Hoc Parallelism • Semi- or unstructured • Usual thread model

  5. Focus onPipeline Parallelism • Most stringent timing requirements • Example applications: • Network Processing • Network Intrusion Detection • DDoS Filtering • Multimedia processing • Transcoding • Signal Processing • Software Defined Radio • Also applies to • Data parallelism • Stream Parallelism

  6. Soft Network Processing(Soft-NP) • How do we protect? • GigE Network Properties: • 1,488,095 frames/sec • 672 ns/frame • Frame dependencies “Frame Shared Memory: Line-Rate Networking on Commodity Hardware”. To Appear: Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communications Systems 2007 (ANCS), December 2007. John Giacomoni, John K. Bennett, Antonio Carzaniga, Douglas C. Sicker, Manish Vachharajani and Alexander L. Wolf.

  7. Frame Shared Memory(Soft-NP) Input (IP) Output(OP)

  8. What OS support is necessary?

  9. Low-OverheadCommunication Gigabit Ethernet Syscalls ~170ns pthread mutex ~200ns

  10. FastForward • Portable software only framework • ~35-40ns/queue operation 2.0 GHz AMD Opteron • ~26-28ns/queue operation 2.6 GHz AMD Opteron • Architecturally tuned CLF queues • Works with strong to weak consistency models • Hides die-die communication • Robust against unbalanced stages • Poster: “FastForward for Efficient Pipeline Parallelism”. Proceedings of the 16th International Conference on Parallel Architectures and Compilation Techniques (PACT), September 2007. John Giacomoni, Tipp Moseley, Manish Vachharajani.

  11. FastForwardPerformance Lamport FF FF Unbalanced FF Re-Balanced

  12. Zero-StallGuarantee

  13. GangScheduling • Optimize for application performance • Instead of system throughput or fairness • Computer Utility -> max(System Utilization) • Multicore system -> excess of resources. • Dedicate resources to pipeline applications • Want selective timesharing

  14. SystemServices • Fast! • Synchronous calls introduce too much overhead • System calls ~ 170ns • Asynchronous calls may limit parallelism • Want: System services with independent I/O paths

  15. PipelinableSystem Services • Mixing stages from multiple process domains • Push model vs. call/return or poll • Hardware can be an active participant

  16. HeterogeneousGang Scheduling • Need a single scheduling label for every pipeline stage • Ensures simultaneous scheduling of every necessary resource • (zero-stall guarantee) • Including hardware stages. • Scheduling multi-domain entities

  17. Multi-DomainEntities • Application state • Shared with local stages • Pipeline private state • Stage state shared with pipeline and parent process. • The multi-domain application model respects the private data model implicit in single-domain applications while providing first-class naming for multi-domain pipelines.

  18. Summaryof Discussion • Low-overhead communication • Zero-stall guarantee • Selective timesharing • Pipelineable system services • Heterogenous gang scheduling • Pipelines as multi-domain applications

  19. Questions? john.giacomoni@colorado.edu

More Related