ECE 7995 Caching And Prefetching Techniques In Computer System

ECE 7995Caching And Prefetching Techniques In Computer System Professor: Dr. Song Jiang

Presentation TPF: A System Thrashing Protection Facility Presented By: Gurdeep Kaur Gill Rina Brahmbhatt

Outline • Objective of the study • Introduction • Evolution of replacement techniques • CPU and Memory Utilization discussion in replacement techniques • Experiments on different replacement techniques in Linux kernels

Outline(Cont.) • The Design and Implementation of TPF • Performance Measurement and Analysis • Related work • Conclusion

Objective • Operating system designers attempt to keep high CPU utilization by maintaining an Optimal multiprogramming Level(MPL) which in turn many processes incur serious memory competition and introduce thrashing • Thrashing eventually lowers CPU utilization

Objective(Cont.) • Objective of this study is to provide highly responsive and cost-effective thrashing protection by introducing priority page replacement in a timely manner • Study introduces a Dynamic System Thrashing Protection Facility in system kernel

Introduction • Thrashing • Multiprogramming Level, acronymic as MPL • CPU Utilization - Can be increased by increasing MPL, running more processes • MPL versus System Thrashing

Introduction(Cont.) • MPL versus System Thrashing - Existing operating systems, like BSD and Solaris, provide load control facility by swapping out/in processes, if necessary for thrashing protection. This facility allows the system to lower MPL, but in turn swapping is quite expensive for both system and user program….leads to look for new optimal technique…..!!

Introduction(Cont.) • Thrashing and page replacement - Thrashing can be directly affected by the method of conducting page replacement Ex. Most operating systems adopt global LRU replacement.

Evolution of page replacement techniques in Linux kernel • We will discuss the different algorithm used in different versions of Linux. • In Linux kernels the Clock algorithm is used for page replacement over LRU. • We take the three recent versions of Linux kernels to illustrate how the thrashing source is introduced from page replacement

Linux 2.0 • The NRU (Not recently used) page contributions are proportionally distributed among interacting processes • swap_cnt variable for each process, initialized to a quantity (RSS/1 MB) proportional to its resident set size

Once an NRU page is taken away from the process, its swap_cnt will be decreased by one • When swap_cnt becomes zero or the searching for an NRU page fails, then the next process in list will be examined • This strategy balances memory usage by making all the processes provide proportional NRU pages.

Disadvantage of this approach is its high potential for thrashing, in turn low CPU utilization

Linux Kernel 2.2 • Kernel 2.2 makes each identified process continuously contribute its NRU pages until no NRU pages available • This strategy allows the rest of the interacting processes to build up their working sets easily by penalizing the memory usage of one process at a time

Code to select a process for page replacement in kernel function swap_out in mm/vmscan.c

Linux Kernel 2.4 • To make memory more efficiently utilized, Kernel 2.4 reintroduces the method used in Kernel 2.0 for selecting processes to contribute NRU pages. Going through a process list each time, it uses about 6% of the address space in each process to search NRU pages. Compared with Kernel 2.2, this method increases its possibility of thrashing.

Evaluation of page replacements in Linux kernels 2.2 • Performance evaluation is experimental measurement based • Machine: Pentium II of 400 MHz with physical memory of 384 MB • Operating system: Redhat Linux release 6.1 with kernel 2.2.14 • Disk: IBM hercules with capacity of 8450MB

To gain insight into VM behavior of application program, we have monitored program execution at kernel level • Monitor prg. Has two functions: User memory space adjustment and system data collection • The monitoring program also collects the MAD, RSS, NAP, NPF, these are the memory status quanta periodically noted every second during execution of prg.

Memory Allocation demand (MAD): The total amount of requested memory address space in pages. • Resident set size (RSS): The total amount of physical memory used by a process in pages • Number of page faults (NPF): It is the number of page faults of process -Minor page fault: This type will cause an operation to relink the page table to the requested page in physical memory

major page fault: It happens when the requested page is not in memory and has to be fetched from disk We only consider major page fault events for each process • Number of Accessed pages (NAP): It is the number of pages accessed by a process within a time interval of 1s All these quanta are collected from task_struct

Memory intensive programs run in a dedicated environment to observe memory access behavior

DESIGN AND IMPLEMENTATION OF TPF PURPOSE Improve system stability under a heavy load by identify a process and help to establish its working set by temporarily granting a privilege to the process for its page replacement. Increase CPU utilization quickly. Memory space is expected to be released after completion and used to satisfy the memory demands of other processes. *Identify Process: smallest difference between its MAD and RSS.

Implementation of TPF in the Linux Kernel 2.2.14 • Detection Routines:Monitors • the page fault rate of each process. • CPU utilization of the system. • Whether Identified process has lowered its page fault rate to a certain degree, if so, then disabled a privilege 2.Protection Routines: • Conduct priority based page replacement • CPU utilization < predetermined threshold • Page fault rate of more than 1 process > threshold • Grants a privilege to an identify process

DETECTION ROUTINE Predetermined Parameters in TPF: • CPU_Low: lowest CPU utilization that the system can tolerate. • CPU_High: targeted CPU utilization for TPF to achieve. • PF_Low: targeted page fault rate of the identified process for TPF to achieve. • PF_High: page fault rate threshold of the process to cause thrashing.

DETECTION ROUTINE(cont’d) Global Link List:when page fault rate > PF_High Process enters into global link list e.g. high_PF_proc: to record interacting process with high page fault rates. Fields in Data Structure: • num_pf: number of page faults detected recently. • start_time: system time for first page fault • privilege: granted a privilege to process(=1) or not (=0)

DETECTION PROTECTION (cont’d)(1.kernel operation to determine and manage the process exceeding the threshold page fault rate) if (process p encounters page faults) { if (p->num_pf == 0) p->start_time = current system time; p->num_pf++; if (p is not in the ‘hign_PF_proc’ list) if (p->num_pf > high_PF) { if (current system time – p->start_time <= 1 second) place p in high_PF_proc; p->num_pf = 0; } }

2. Kernel operation If page fault rate < PF_Low (remove the process from the list) If (length(hign_PF_proc) >= 1) { for each p in the list do { if (current system time – p->start_time >= 1 second) { if (p->num_pf/(current system time – p->start_time) < PF_Low { if (p->privilege = 1) p->privilege = 0; remove p from the list; } p->num_pf = 0; p->start_time = current_system_time; } } }

When protection routine triggered? When following 3 conditions are all true: If ((CPU utilization < CPU_Low) && (length(high_PF_proc >= 2) && (no process has been protected)) { for all processes in high_PF_proc select the least memory hungry process p; p->privilege = 1; } CPU utilization: current CPU utilization = (1 – Idle Ratio) Idle Ratio: CPU time portion used for the idle processes in the last second.

PROTECTION ROUTINE 1.Reset swap_cnt = 0, even if NRU page is obtained 2.NRU pages from all interacting processes used to build working set. 3.If page fault rate(protected process) < PF_Low CPU utilization > CPU_High Remove from high_PF_proc list.

STATE TRANSITION IN THE SYSTEM • Normal State • Monitoring state • Protection State

PERFORMANCE MEASUREMENT AND ANALYSIS Observation and measurements of TPF facility The predetermined threshold values are set as: CPU_Low = 40%, CPU_High = 80%, PF_High = 10 faults/sec PF_Low = 1 fault/sec PF_High: used to identify an active process for a protection when thrashing starts CPU_Low: used to triggered TPF to select process 10 Faults/sec: 1.how much time process spends on page swapping 2.How much process demands more memory . e.g 10 faults/sec implies 10% more memory space.

Why we chose a small PF_High value? e.g 10 faults/sec • Makes TPF responsive at an early stage to handle thrashing • Value of 10 faults/sec provides a proper basis for identifying a process for protection • TPF will quickly reduce a page fault rate even if a non-active process waiting for I/O.

Under what conditions, thrashing happen in multiprogramming environment? 1.The memory demands of one interacting program have sudden ‘spikes’ 2.The variance of memory demands, access patterns, and access rates of interaction processes are similar 3.Serious memory shortage happens in the system. For what cases is TPF most effective? For what cases is TPF ineffective? How do the threshold parameters affect the performance of TPF?

Working Set Model Used to estimate current memory demand of a running program in the system Working set: set of its recently used pages. Ensures the same program with the same input data would have the same locality measurements, which is independent of the memory size, the multiprogramming level. Ensures no pages in the working set of of a running program will be replaced.

Conclusion • Conducting experiments and performance evaluation, we show that the TPF facility can effectively provide thrashing protection without negative effects to overall system performance for three reasons: (1) the privilege is granted only when a thrashing problem is detected; (2) although the protected process could lower the memory usage of the rest of the interacting processes for a short period of time, the system will soon become stable by the protection; and (3) TPF is simple to implement with little overhead in the Linux kernel.

ECE 7995 Caching And Prefetching Techniques In Computer System

ECE 7995 Caching And Prefetching Techniques In Computer System

Presentation Transcript

Supporting High Performance I/O with Effective Caching and Prefetching

Replication, Caching, Prefetching and Hoarding for Mobile Computing

Advance Caching Techniques

Prefetching

Informed Prefetching and Caching

ECE 695: Fault-tolerant Computer System Design

ECE7995 Caching and Prefetching Techniques in Computer Systems

File System Variations and Software Caching

Caching in Distributed File System

Hierarchical Caching and Prefetching for Continuous Media Servers with Smart Disks

Energy Efficient Prefetching and Caching

Advanced Caching Techniques

Predictive Caching and Prefetching of Query Results in Search Engines

ASP.NET 2.0 Caching Techniques + Ajax Overview

A Prefetching Memory System for Mediaprocessors

15-740/18-740 Computer Architecture Lecture 17: Prefetching, Caching, Multi-core

Prefetching Techniques

ECE 695: Fault-tolerant Computer System Design

Lecture 25: Advanced Data Prefetching Techniques

Advanced Topics: Prefetching ECE 454 Computer Systems Programming

Energy Efficient Prefetching and Caching