1 / 27

MP3: Virtual Memory Page Fault Measurement

MP3: Virtual Memory Page Fault Measurement. University of Illinois at Urbana-Champaign Department of Computer Science CS423 – Fall 2011. Keun Soo Yim. Goal. A Linux kernel module to profile VM system events. Understand the Linux virtual to physical page mapping and page fault rate.

mingan
Download Presentation

MP3: Virtual Memory Page Fault Measurement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MP3: Virtual Memory Page Fault Measurement University of Illinois at Urbana-Champaign Department of Computer Science CS423 – Fall 2011 Keun Soo Yim

  2. Goal A Linux kernel module to profile VM system events • Understand the Linux virtual to physical page mapping and page fault rate. • Design a lightweight tool that can profile page fault rate. • Implement the profiler tool as a Linux kernel module. • Learn how to use the kernel-level APIs for workqueue, character device driver, vmalloc, and mmap. • Test the kernel-level profiler by using a given user-level benchmark program. • Analyze, plot, and document the profiled data as a function of the workload characteristics. CS423 MP3

  3. Introduction • Due to growing perf. gap btwn. memory and disk, management efficiency of OS virtual memory (VM) becomes more important • Inefficient replacement of pages can seriously harm the response time of user-level programs • To optimize VM system, it is necessary to understand the characteristics of current VM system under various workloads. CS423 MP3

  4. Metrics • Major and minor page fault counts provide: • Major page fault is a fault handled by using a disk I/O operation (e.g., memory mapped file or page replacement causing a page swapping) • Minor page fault is a fault handled without using a disk I/O operation (e.g., allocated by malloc()). • Plotted as a function of allocated memory size shows the thrashing effect. • CPU utilization provides: • Plotted as a function of the degree of multiprogramming shows the correlation between workload size and system utilization. CS423 MP3

  5. Measurement Challenge • To accurately measure such metrics, many profiling operations are needed in a short time interval. • Because such data are available only in the OS kernel address space, this would cause a non-negligible performance overhead • Switching contexts between user and kernel and copying data between these two address spaces CS423 MP3

  6. A Solution • This measurement overhead problem can be addressed by using mmap(): • Creating a shared buffer between the OS kernel and the user-level process. • By mapping a set of physical pages allocated in the kernel space to the virtual address space of the user-level process, • The user-level process can access the data stored in the buffer without any extra overhead other than accessing the memory. CS423 MP3

  7. Overview • A kernel module to profile page fault counts and CPU utilization of registered processes. Work Process 1 (100MB) Work Process 2 (10MB) Work Process 3 (1GB) Monitor Process Linux Kernel MP3 Profiler Kernel Module Disk Post-Mortem Analysis CS423 MP3

  8. Interface of Kernel Module • Three types interfaces between the OS kernel module and user processes: • a Proc file • a character device driver • a shared memory area CS423 MP3

  9. Proc File System • Proc filesystem entry (/proc/mp3/status) • Register: Application to notify its intent to monitor its page fault rate and utilization. • ‘R <PID>’ • Deregister: Application to notify that the application has finished using the profiler. • ‘U <PID>’ • Read Registered Task List: To query which applications are registered. • Return a list with the PID of each application CS423 MP3

  10. Char Device & Shared Mem • A character device driver is used as a control interface of the shared memory • Map Shared Memory (i.e., mmap()): To map the profiler buffer memory allocated in the kernel address space to the virtual address space of a requesting user-level process • Shared memory • Normal memory access: Used to deliver profiled data from the kernel to user processes CS423 MP3

  11. Synthetic Workload • Work program (given for case studies) • A single threaded user-level application with three parameters: memory size, locality pattern, and memory access count per iteration • Allocates a request size of virtual memory space (e.g., up to 1GB) • Accesses them with a certain locality pattern (i.e., random or temporal locality) for a requested number of times • The access step is repeated for 20 times. • Multiple instances of this program can be created (i.e., forked) simultaneously. CS423 MP3

  12. Monitoring Program • Monitor application is also given • Requests the kernel module to map the kernel-level profiler buffer to its user-level virtual address space (i.e., using mmap()). • This request is sent by using the character device driver created by the kernel module. • The application reads profiling values (i.e., major and minor page fault counts and utilization of all registered processes). • By using a pipe, the profiled data is stored in a regular file. • So that these data are plotted and analyzed later. CS423 MP3

  13. Design A1 Work Process Proc FS Write Op. Control a Work Queue A2 A5 A3 Char. Device Driver Interface A4 Linked List for Reg. Tasks B1 Monitor Process Module Init/Exit mmap() Monitor Work Queue B2 B4 Allocate or free B3 Profiler buffer Process Control Block Kernel Space A3. Memory Accesses A1. Register A2. Allocate Memory Block A4. Free Memory Blocks A5. Unregister B1. Open B2. mmap() B3. Read Profiled Data B4. Close CS423 MP3

  14. Work Queue • Work queue • The simplest to use among all bottom-halves (e.g., thread/sleep, tasklet). • Only bottom-half mechanism runs in process context. • Work queues run in process context. • Work queues can sleep, invoke the scheduler, and so on. • The kernel schedules bottom halves running in work queues. • The other bottom-halves run in interrupt context. • Interrupt context cannot perform blocking operation. • e.g., semaphore, copying to/from user memory, or non-atomically allocating memory. Reference: http://www.linuxjournal.com/article/6916 CS423 MP3

  15. Work Queue • A default set of kernel threads handles WQs • One of these default kernel threads runs per processor (named events/n - n is processor ID). • The work queue threads execute user’s bottom half as a specific function, called a work queue handler. • It is possible to run work queues in users’ own kernel thread. • Whenever your bottom half is activated, your unique kernel thread, wakes up and handles it. • Having a unique work queue thread is useful only in certain performance-critical situations. CS423 MP3

  16. Work Queue Interface • Header #include <linux/workqueue.h> • Creates a work queue structure void my_wq_handler(void *arg); Static: DECLARE_WORK(name, my_wq_handler, data) This macro creates and inits a struct work_struct Dynamic: INIT_WORK(p, function, data) p is a pointer to a work_struct structure INIT_DELAYED_WORK(p, function) CS423 MP3

  17. Work Queue Interface • Schedule to run immediately int schedule_work(struct work_struct *work) Returns zero on error • Schedule to run after a delay int schedule_delayed_work(struct work_struct *work, unsigned long delay) Example, to run after at least 5 seconds, schedule_delayed_work(&my_work, 5*HZ) CS423 MP3

  18. Work Queue Interface • To wait on all work queue pending void flush_scheduled_work(void) • Cancel a delayed work int cancel_delayed_work(struct work_struct *work) CS423 MP3

  19. Work Queue Interface • When user’s own thread is used, struct workqueue_struct * create_workqueue(const char *name) int queue_work(struct workqueue_struct *wq, struct work_struct *work) int queue_delayed_work(struct workqueue_struct *wq, struct work_struct *work, unsigned long delay) void flush_workqueue(struct workqueue_struct *wq) CS423 MP3

  20. Character Device Driver • Initialize data structure void cdev_init(struct cdev *cdev, struct file_operations *fops); or struct cdev *my_cdev = cdev_alloc( ); my_cdev->ops = &my_fops; • Add to the kernel int cdev_add(struct cdev *dev, dev_t num, unsigned int count); • Delete from the kernel void cdev_del(struct cdev *dev); CS423 MP3

  21. Character Device Driver static int my_open(struct inode *inode, struct file *filp); static struct file_operations my_fops = { .open = my_open, .release = my_release, .mmap = my_mmap, .owner = THIS_MODULE, }; CS423 MP3

  22. Memory Map Virtual Addr. Virtual Addr. 4GB 4GB Physical Addr. Profiler Buffer Profiler Buffer 3GB 3GB vmalloc() kmalloc() “PG_reserved” Profiler Buffer 0GB 0GB CS423 MP3

  23. Memory Map • Gets Page Frame Number pfn = vmalloc_to_pfn(virt_addr); • Maps a virtual page to a physical frame remap_pfn_range(vma, start, pfn, PAGE_SIZE, PAGE_SHARED); CS423 MP3

  24. Interface for User Process • Character device file $ insmod mp3.ko $ cat /proc/devices <check the created device’s major #> $ mknod node c <major #> 0 CS423 MP3

  25. Interface for User Process • Open and mmap requests (in monitor.c) if ((buf_fd=open(fname,O_RDWR|O_SYNC))<0) { printf("file open error. %s\n", fname); return NULL; } kadr = mmap(0, buf_len, PROT_READ | PROT_WRITE, MAP_SHARED, buf_fd, 0); munmap(kadr, buf_len); CS423 MP3

  26. Interface for User Process $ cat /proc/<pid>/maps 00400000-00401000 r-xp 00000000 08:01 660011      /root/mp3dev/monitor00600000-00601000 rw-p 00000000 08:01 660011      /root/mp3dev/monitor3183e00000-3183e1f000 r-xp 00000000 08:01 543075  /lib64/ld-2.14.so318401e000-318401f000 r--p 0001e000 08:01 543075  /lib64/ld-2.14.so318401f000-3184020000 rw-p 0001f000 08:01 543075  /lib64/ld-2.14.so3184020000-3184021000 rw-p 00000000 00:00 03184200000-318438f000 r-xp 00000000 08:01 551625  /lib64/libc-2.14.so318438f000-318458f000 ---p 0018f000 08:01 551625  /lib64/libc-2.14.so318458f000-3184593000 r--p 0018f000 08:01 551625  /lib64/libc-2.14.so3184593000-3184594000 rw-p 00193000 08:01 551625  /lib64/libc-2.14.so3184594000-318459a000 rw-p 00000000 00:00 07f9b65eda000-7f9b65f5a000 rw-s 00000000 08:01 660101  /root/mp3dev/node7f9b65f5a000-7f9b65f5d000 rw-p 00000000 00:00 07f9b65f75000-7f9b65f76000 rw-p 00000000 00:00 07fffc1683000-7fffc16a4000 rw-p 00000000 00:00 0       [stack]7fffc17ff000-7fffc1800000 r-xp 00000000 00:00 0       [vdso]ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0  [vsyscall] • start-end perm offset major:minor inode image • Start-end: The beginning andending virtual addresses for this memory area. • Perm: a bit mask with the memroy are’s read, write, and execute permissions • Offset: Where the memory area begins in the file • Major/Minor: Majnor and minor numbers of the device holding the file (or partition) CS423 MP3

  27. Case Study 1 • Thrashing and locality. • Work process 1: 512MB Memory, Random Access, and 50,000 accesses per iteration • Work process 2: 512MB Memory, Random Access, and 10,000 accesses per iteration $ nice ./work 512 R 50000 & nice ./work 512 R 10000 & … <after completing the two processes> $ ./monitor > profile1.data • Plot a graph where x-axis is the time and y-axis is the accumulated page fault count of the two work processes (work processes 1 and 2). • Analyze the quantitative difference between graphs and discuss where such differences come from. CS423 MP3

More Related