140 likes | 260 Views
Memory Management and RMAP VM of 2.6. By A.R.Karthick (karthick_r@infosys.com ). Memory Hierarchies. L1 cache. T I M E. L2 cache. RAM. Hard Disk. Page Tables. Define the virtual to physical mapping
E N D
Memory Management and RMAP VM of 2.6 By A.R.Karthick (karthick_r@infosys.com)
Memory Hierarchies L1 cache T I M E L2 cache RAM Hard Disk
Page Tables • Define the virtual to physical mapping • Page directory,page mid level directory,page table entry define the course of translation • Example: PGD 10 bits PTE 10 BITS (PMD folded in 32 bit) 0000 0000 00 | 00 1000 0000 | 1100 0000 1111 -> (0x00080c0f) pgd index(0) pte_index(1 << 7) , pmd is folded to pgd
Page Table Entry Status Bits (PTE Entry) PAGE_PRESENT PAGE_RW PAGE_USER PAGE_RESERVED PAGE_ACCESSED PAGE_DIRTY INTERNAL_STATUS
Page Fault • Processor Exception raised when there is a problem mapping the virtual address to physical address. • Handled by do_page_fault in arch/i386/mm/fault.c. • Write protection faults or COW faults map to do_wp_page. • For pages in swap, do_swap_page is called. • For pages not found, do_no_page is called that either faults in an anonymous zero page or an existing page. • Page faults populate the LRU cache.
Page Replacement Algorithms • Optimal Replacement Not possible • Not Recently Used (NRU) Crude hack • FIFO Inefficient • Second Chance Better than above • Clock Replacement Efficient than above
Page Replacement Algorithms • LRU – Least Recently used replacement • NFU – Not Frequently Used replacement • Page Ageing based replacement • Working Set algorithm based on locality of references per process • Working Set based clock algorithms • LRU with Ageing and Working Set algorithms are efficient to use and are commonly used
Page replacement handling in Linux Kernel • Page Cache • Pages are added to the Page cache for fast lookup. • Page cache pages are hashed based on their address space and page index • Inode or disk block pages, shared pages and anonymous pages form the page cache. • Swap cached pages also part of the page cache represent the swapped pages. • Anonymous pages enter the swap cache at swap-out time and shared pages enter when they become dirty.
LRU CACHE • LRU cache is made up of per zone active lists and inactive lists. • Per-CPU lru active and inactive page vectors make lru cache additions faster. • These lists are populated during page faults and when page cached pages are accessed or referenced. • kswapd is the page out kernel thread per node that balances the LRU cache and trickles out pages based on an approximation to LRU algorithm. • Page stealing is performed on a page vector or performed in batches. • Active state • Inactive dirty state • Inactive clean state • Per-CPU cold pages
Zone Balancing • Kswapd performs zone balancing based on pages_high, pages_low and pages_min • Zone is considered balance with its free pages above pages_high • The page out process takes a page by scanning inactive pages in batches. • Batch page stealing scales well for large physical memory.
RMAP • Maintains mapping of a page to a pte/virtual address • Greatly speeds up the page unmap path without scanning the process virtual address space • Unmapping of shared pages is greatly improved because of availability of pte mappings for shared pages • Page faults are reduced because pte entries are unmapped only when required. • Reduced search space during page replacement as only inactive pages are touched. • Low overhead involved in adding reverse mapping during fork, page fault , mmap and exit paths.
RMAP struct pte_chain { unsigned long next_and_idx; pte_addr_t ptes[NRPTE]; }____cachelinealigned; • next_and_idx field contains both the index to the next pte in the same chain or a pointer to the next pte chain ,thus aiding in fast pte chaining. • pte chains have free slots at the top or the head of the chain and additions happen from the tail. • process mm_struct pointer is kept in the pages address space, that is used during swapout times.
VM-Overcommit Policies • Commit more than available/actual memory space which includes the swap space to the process. • Overcommit policies can be set through sysctl vm.overcommit_{memory,ratio} • 0 indicates no overcommit • 1 indicates overcommit totally. • 2 indicates overcommit with overcommit_ratio on total ram pages plus total swap space pages. • mmap,mprotect, munmap, brk, shared memory, affect overcommit.
References • Primarily Linux Kernel Source Code 2.6 • Towards an O(1) VM by Rik Van Riel – Proceedings of the Linux Symposium –Ottawa