280 likes | 892 Views
Corey: An Operating System for Many Cores. Silias Boyd-Wickizerr, Haibo Chen, RongChen, Yandong Mao, Frans Kaashoek, Roberrt Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, Zheng Zhang MIT, Fudan University, Microsoft Research Asia, Xi’an Jiaotong University 2008 SOSP.
E N D
Corey: An Operating System for Many Cores Silias Boyd-Wickizerr, Haibo Chen, RongChen, Yandong Mao, Frans Kaashoek, Roberrt Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, Zheng Zhang MIT, Fudan University, Microsoft Research Asia, Xi’an Jiaotong University 2008 SOSP 강동우 redcdang@gmail.com
Agenda • Introduction • Motivation • Design • Evaluation • Conclusion
Introduction • Most PCs have or will have multicore chips • Cache-coherent shared memory hardware is the new standard • Performance of some OS services scales very poorly with number of cores/processors
Motivation • Scalability Problem • Benchmark • Number of threads within a process • Each thread cresates a file descriptor, and then each thread repeatedly duplicates(dup) and close • 4 Quad-Core AMD Opteron , Linux 2.6.25
Motivation • AMD 16core System Topology Takes More time to access L1 or L2 cache of another core than accessing shared L3 Cache Takes more time to access Caches in another chip than local caches Kernels must mainly access data in the local core’s cache
MapReduce • Map phase: • Processes read parts of application’s inputs • Generate intermediary results and store them locally • Reduce phase: • Processors collate results produced by multiple map instances • Produce the output
Design • Goal • Allow Application to control Shared Resources • 3 Absractions • Address Range Abstraction • Control page talbes and the kernel data used to manage them • Kernel Core Abstraction • Allow applications to dedicate cores to running particular kernel functions. • Shares Abstraction • Control the kernel data used to resolve application references
Design - Address Range Abstraction • Current 2 Types of Address Space • Single Address Space • Multiple Threads • Separate Address Space • Multiple Processes • Share memory with mmap Extra Soft Page Faults Contention Map is bad, Reduce is good Map is good, Reduce is Bad
Design - Address Range Abstraction • Address Ranges • to give applications high performance for both private and shared memory • kernel-provided abstraction that corresponds to a range of virtual-to-physical mappings • mm_struct Non Contention on Map Phase No Page Fault, Because share the hardware page tables
Design – Kernel Core Abstraction • System calls are executed on the core of the invoking process • if the system call needs to access large shared data structures, it is not good.( Many cache miss ) • Applications can dedicate cores to kernel functions and data • Kernel Core can manage hardware devices and execute system calls sent from other cores.
Design – Shares Abstraction • File Descriptors, Process ID • Many kernel operations involve looking up identifieres in tables to yield a pointer to the relevant kernel data structure • Shared FD table is a bottleneck • Shares • Applications can control how cores share the kernel data structures used to do lookups
Implementation • Low-Level Kernel • Architecture specific functions, Device Drivers, • 11,000 Lines of C, 150 Lines of Assembly • Unix like Environment • 11,000 Lines of C/C++, • Buffer cache, cfork, TCP/IP Stack Interface(lwIP)
Performance Evaluation • AMD 16-Core System • 8GB Memory • Linux kernel 2.6.25 • Pin one thread to each core • Intel Pro/1000 Ethernet Device
Evaluation - Address Range Abstraction • memclone • private memory • Each core allocate its own 100 MB array and modify each page of the array • mempass • shared memory • Allocates a single 100MB array on one of the clones, touches each buffer page and passes it to the next core which repeats the process
Evaluation – Kernel Cores Abstraction • Simple TCP Service • Dedicated • use a kernel core for all network processing • Polling • use a kernel core only to poll for packet notifications and transmit completions
Evaluation – Shares Absraction • 2 Microbenchmarks • Each core calls share_addobj() to add a per core segment to a global share then calls share_delobj() to delete that segment • same but per core segment is added to a local share
Evalution - Applications • MapReduce • wri MapReduce Application • …Maybe Word Count… • 1GB File
Evalution - Applications • Increase performance by dedicating application data to cores • webd application called filesum • return the sum of the bytes in that file • Random mode, Locality mode
Conclusions • Applications should be scaleable on Multicore Architectures • Corey is a new kernel • Address Range, Kernel Core, Share • Show that Can avoid scalability bottlenecks • MapReduce and Web Application
cfork • cfork(core_id) is an extension of UNIX fork() that creates a new process (pcore) on core core_id • Application can specify multiple levels of sharing between parent and child • Default is copy-on-write
Network • Applications can decide to run • Multiple network stacks • A single shared network stack
Buffer Cache • Shared buffer like regular UNIX buffer cache • Three modifications • A lock-free tree allows multiple cores to locate cached blocks w/o contention • A write scheme tries to minimize contention • A scalable read/write lock
MCS Lock • FIFO ordering of lock acquisitions • Critical Section에 접근하려는 Task는 자신을 empty queue에 삽입. • 락을 해제할 때 task는 다음에 사용할 task를 지정 • queue에서 바로 뒤에 들어온 task를 지정 • 장점 • local spin(bus traffic이 적다) • task들간의 contention이 없고, 정해진 순서로 lock을 획득 • waiting time이 일정 시간으로 bound
Thread A on Core 0 Thread B on Core 1 root share root share perprocess share fd fd fd paper.pdf text.txt shared_avi.avi Private Share Shared Share