150 likes | 279 Views
SCC Development Experiences. Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30 th , 2011. Overview. Black Cloud OS: A fork of Singularity OS Our playground for experimenting with message passing in non-cache coherent environment
E N D
SCC Development Experiences Alexey Pakhunov /XCG, Microsoft Research/ alexeypa@microsoft.com March 30th, 2011
Overview • Black Cloud OS: • A fork of Singularity OS • Our playground for experimenting with message passing in non-cache coherent environment • This presentation covers only our development experiences on the SCC • Submission of the paper is on its way
What is Singularity? • A quote from Singularity home page: “A research operating system prototype, extending programming languages, and developing new techniques and tools for specifying and verifying program behavior” • Written in managed code • Some Assembler and C++ in the boot loader and kernel • IPC and inter-component communications are based on passing messages
Our setup Tile Tile Tile Tile Tile Tile R R R R R R Tile Tile Tile Tile Tile Tile DDR3 MC DDR3 MC R R R R R R Tile Tile Tile Tile Tile Tile PCI-E R R R R R R Management Console (Linux) sccTcpServer/mceGui TCP/IP Desktop PC (Windows) RcLoader.Net, KdProxy, WinDbg, etc. Tile Tile Tile Tile Tile Tile DDR3 MC DDR3 MC R R R R R R VRC System Interface
RcLoader.Net • Configuration • Generates the system memory map • Configures the SCC registers • Uploads the boot loader and OS images • Supports manual editing of the SCC configuration • Debugging • Allows inspecting the memory and configuration registers
The memory map Shared memory (OS image, the initial jmp) 0xFC000000 – 0xFFFFFFFF Unused Shared memory buffers (256KB per core) 0xC0000000 – 0xC3FFFFFF Configuration space 0xA0000000 – 0xB7FFFFFF MPB (16KB per tile) 0x80000000 – 0x97FFFFFF Unused Private Memory (336 MB - 1360 MB) 0x00000000 - up to 0x54FFFFFF
Debugging challenges • No serial port or console • Memory at 0xb8000 is the console buffer • I/O redirection doesn’t work as expected • Execution of IN or OUT instruction effectively halts the core and sccTcpServer • Serial KD transport is emulated • A couple of ring buffers on the SCC side • KdProxy.exe exposes a named pipe interface for the debugger
Porting challenges • No BIOS • The system memory map is patched directly in the boot loader • No standard devices • Local APIC is used instead of i8254 timer and PIC • No RTC clock • No modern instruction supported • Context handling code was updated due to lack of MMX • 32bit flavor of Singularity uses only x87 for floating point calculations • Bartok compiler was patched due to lack of CMOV instructions
Experimental hardware • Turning on MPB bypass bit causes a race causing memory corruptions • Minus three days of debugging :-) • We couldn’t take advantage of fast MPB access • Large pages cannot be used together with MPB • Singularity uses large pages to create the identity mapping spanning 4GB
Interface • A telnet connection to each core • The same serial transport emulation via KdProxy.exe was used
Cache coherency matters • A read-only OS image is shared among all cores • Message passing code uses MPB-mapped buffers and CL1FLUSH-aware memcpy() • Large shared memory storage is accessible via dynamically remapped LUTs • R/W access is possible with proper cache flushing and/or caching settings in PTEs
Performance • Core’s memory interface bandwidth is limited • One outstanding memory operation
Performance • Memory controller bandwidth is limited
Conclusions • The SCC is an experimental platform tailored for message passing • Lack of cache coherency makes us think hard how about message passing • The chip has enough cores to play with scalability • Compare apples to apples • The cache and memory subsystems are significantly different • The SCC is super parallel, not super fast