210 likes | 340 Views
S(o)OS in a Nutshell. Towards the Large-Scale OS. Structure of this presentation. Content. Background / Introduction: current problems of OS and HPC / tightly coupled systems The basic idea behind S(o)OS a quick overview over the main concepts Project structure & management approach.
E N D
S(o)OS in a Nutshell Towards the Large-Scale OS
Structure of this presentation Content • Background / Introduction: current problems of OS and HPC / tightly coupled systems • The basic idea behind S(o)OS a quick overview over the main concepts • Project structure & management approach Presenter
issues for large scale systems Background
Brief overview over current HPC Trends Presenter • We face systems with • hundred of thousands of cores • Heterogeneous cores • Accelerators attached using wide range of technologies • Myriad of connection options • Hypertransport/Quickpath • PCIExpress • Gigabit Ethernet • Infiniband • Lot of vendor specific connectivity • This systems cannot easily be programmed • The potential cannot be exploited • Current approaches are working on the symptoms (PGAS, CUDA, OpenCL, etc.)
Current Operating Systems Issues • Focus on homogeneous resource infrastructures • Scale well wrt. processes but not wrt. processing units • Are essentially centralistic => bottleneck, OS jitter ... • Future environments will be • large-scale, • heterogeneous and • potentially widely distributed Current OS architectures can not deal with this Presenter
Current Operating Systems Communication Proc. Unit Operating System Cache Proc. Unit Assign Process Procedure Call Context Switch Memory Access Cross Core Comm. Load Process Process Load Assign Process Procedure Call Context Switch Memory Access Cache Proc. Unit Cross Core Comm. Load Process Process Load Massive communication and management overhead Presenter
Distributed Programming Issues Presenter • Require a lot of knowledge about • The resource infrastructure • The relationship between algorithm and data • The potential distribution of the algorithms and • Its connectivity / communication • Etc. • Large scale tightly coupled systems will become a common good • Programming models must become more efficient and manageable
Revising OS Architectures Towards S(o)OS
General Concept • Distribute operating system and code across resource infrastructure • Rearrange running code and operating system according to • Availability of resources • Requirements of code • Linkage between data Presenter
Overall Concept Presenter
PU MU ALU Control Memory Control In / Out In / Out Data Bus Real Von Neumann Memory ALU Control In / Out Data Bus Presenter
The OS Monster Presenter
Obstacles Limited cache size Communication is costly Resource environment is dynamic or changes between executions Data consistency Code consistency Presenter
Basic Principles 1 File Management Scheduler Job Mgr Memory HAL I/O Presenter • Distributed, Self-Managed Microkernels • Following the SOA / Grid principle OS functionalities are separate “services” • “Code is where the data is” • Dynamic composition of elements according to code segment requirements
Basic Principles 2 OS Modules Procedure Calls etc. Code Segments Self-referencing Presenter • Runtime Code Behaviour Analysis • Identify code parts with strong relationships • Primary and secondary data-sets • OS relation • Distribute segments according to requirements and availability • Annotate memory with analysis results
“Hacking” Applications f:0.5 f:0.9 OS1 OS2 P1.B1 w:0.8 w:0.3 f:0.2 P2.B1 w:0.9 w:0.5 f:0.3 f:0.8 P1.B2 P1.B1 P1.B2 w:0.9 w:0.7 w:0.3 w:0.6 D.B1 D.B2 P2.B1 Annotated Virtual Memory
Basic Principles 3 Virtual process space Exec Env Exec Env Presenter • Distributed Execution Model • Whole code can be distributed, not just threads • Execution context may move between cores • Distribution may vary with infrastructure • Essential distribution information is maintained with code • Reduce communication overhead
Project Goals New OS architectures / paradigms New approaches and algorithms to deal with future distributed execution systems Proof-of-concept implementation of distributed execution support tools Presenter
Notes on Project Structure Managing S(o)OS
Work Packages Two main strands: • Design Strand Development of algorithms, architectures, reference implementations • WP2: looks at the development from a hardware perspective • WP3: examines (communication) protocols • WP4: deals with distributed execution models • Integration & Testing Strand Aligns the models, performs integrated, application related tests • WP5: OS model • WP6: Application Presenter
Participants • HLRS, University of Stuttgart • Instituto de TelecomunicaçõesAveiro • RETIS Lab, Sant'Anna School of Advanced Studies • CTIT, UniversiteitTwente • Ecole Polytechnique Fédérale de Lausanne • European Microsoft Innovation Centre Presenter