1 / 13

OS support for Teraflux A Prototype

OS support for Teraflux A Prototype. Avi M endelson Doron Shamia. System and Execution Models Data Flow Based. System is made out of clusters. Each cluster contains 16 cores ( may change) Each cluster is controlled by a single “OS kernel”; e.g., Linux , L4

gustav
Download Presentation

OS support for Teraflux A Prototype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OS support for TerafluxA Prototype Avi Mendelson Doron Shamia

  2. System and Execution ModelsData Flow Based • System is made out of clusters. • Each cluster contains 16 cores (may change) • Each cluster is controlled by a single “OS kernel”; e.g., Linux, L4 • Execution is made up of tasks; each task • Has no side effects • Are scheduled with their data (may use pointers) • May return results • If fail to complete, can be reschedule on the same core/other core • Tasks can be executed on any (service) cluster and has a unified view of system memory • All resource allocation/management is done in two levels, a local one and a global one

  3. System Overview Target Protoyped System Cores View Memory View Linux CPU CPU CPU CPU L4 CPU CPU CPU CPU Configuration Page Message Buffers CPU CPU CPU CPU Linux CPU CPU CPU CPU L4 CPU == Cluster

  4. Target SystemOS Requirements Linux Linux (Full OS) CPU CPU CPU CPU • Each uK runs a job • Jobs sent by full OS (FOS) • Jobs have no side-effects • Failed jobs are simply restarted • Runs low level FT, reporting to FOS Single chip Multi cores CPU CPU CPU CPU L4 (uKernel) CPU CPU CPU CPU • Manages jobs on uKernel (uK) cores • Proxies uKs I/O requests • Remote debug uKs/self • Runs high level (system) FT managing uK/self faults CPU CPU CPU CPU L4

  5. Communications (1) Buffer L4 • Ownership (L4/Linux) • Ready flag • Type • Length (bytes) • Data • Fixups (optional) Buffer Configuration Page Message Buffers Buffer Buffer Linux

  6. Communications (2) Buffer L4 • Ownership (L4/Linux) • Ready flag • Type • Length (bytes) • Data • Fixups (optional) Buffer Configuration Page Message Buffers Buffer Buffer • Ownership: who currently uses the buffer • Ready: Signals the buffer is ready to be transferred to the other side (inverse owner) • Type: The message type • Data: simply the raw data (according to type) • Fixups: A list of fixups in case we pass pointers Linux

  7. Current Prototype • Goal: Quick development of OS support, and applications (later to move on COTson full prototype) • Quick prototyping via VMs • Linux on both ends (Fedora 13) • Main node = Linux (host) • Service Nodes = Linux (VMs) • Using shared memory between • Host and VMs • Between VMs • Shared memory uses kernel driver (ivshmem)

  8. Prototype Architecture Linux F13 (Host) App Linux F13 QEMU Linux F13 QEMU User space Kernel space IVSHMEM Linux F13 QEMU Linux F13 QEMU

  9. IV Shared Memory Arch mmap to user level Exposed as a PCI BAR QEMU maps shared-memory into RAM

  10. Communications App App Linux F13 QEMU Linux F13 (Host) Host App Logic Data Flow App Message queue API Message queue API Linux F13 QEMU User space Kernel space Shared RAM Msg Msg Msg Linux F13 QEMU Linux F13 QEMU

  11. Demo (toy) Apps • Distributed sum app • Single work dispatcher (host) • Multiple sum-engines (VMs) • Distributed Mandelbrot • Single work dispatcher – lines (host) • Multiple compute engines – compute pixels of each line (VMs)

  12. Futures • Single Boot • A TeraFlux chips boots a FOS • FOS boots the uKs on the other cores • Looks like a single boot process • Distributed Fault Tolerance • Allow uK/FOS to test each others health • One step beyond FOS-centric FT • Cores Repurposing • If FOS cores fail, uK cores re-boot as FOS • New FOS takes over using last valid data snapshot

  13. References Inter-VM Shared memory

More Related