400 likes | 633 Views
Chapter 4 File System —— File System Cache. Li Wensheng wenshli@bupt.edu.cn. Outline. Introduction to File Caching Page Cache and Virtual memory System File System performance. Introduction to File Caching. File Caching One of the most important features of a file system
E N D
Chapter 4 File System—— File System Cache • Li Wensheng • wenshli@bupt.edu.cn
Outline • Introduction to File Caching • Page Cache and Virtual memory System • File System performance
Introduction to File Caching • File Caching • One of the most important features of a file system • Unix file system caching is implemented in the I/O subsystem by keeping copies of recently read or written blocks in a block cache • Solaris, implemented in the virtual memory system
Solaris Page Cache • Page chahe • a new method of caching file system data • developed at Sun as part of the virtual memory • used by System V Release 4 Unix • now also used in Linux and Windows NT • major differences from the old caching method • it’s dynamically sized and can use all memory that is not being used by applications • it caches file blocks rather than disk blocks • The key difference is that the page cache is a virtual file cache rather than a physical block cache
The Solaris Page Cache for internal file system data -- metadata items (direct/indirect blocks, inodes) for file data
Block Buffer Cache • used for caching of inodes and file metadata • In old versions of Unix, fixed in size by nbuf • specified the number of 512-byte buffers • now also dynamically sized • can grow by nbuf, as needed, • until it reaches a ceiling specified by the bufhwm • By default, it is allowed to grow until it uses 2 percent of physical memory. • We can look at the upper limit for the buffer cache by using the sysdef command.
sysdef command. • # sysdef • * • * Tunable Parameters • * • 7757824 maximum memory allowed in buffer cache (bufhwm) • 5930 maximum number of processes (v.v_proc) • 99 maximum global priority in sys class (MAXCLSYSPRI) • 5925 maximum processes per user id (v.v_maxup) • 30 auto update time limit in seconds (NAUTOUP) • 25 page stealing low water mark (GPGSLO) • 5 fsflush run rate (FSFLUSHR) • 25 minimum resident memory for avoiding deadlock (MINARMEM) • 25 minimum swapable memory for avoiding deadlock (MINASMEM)
Buffer cache size needed • 300 bytes per inode • about 1 MB per 2 GB of files • Example • A DBS with 100 files, total 100GB of storage space • Access only 50GB at the same time • Need: • 100*300 bytes=30KB for inodes • 50/2*1MB=25MB for metadata (direct and indirect blocks) • On a system with 5GB of physical memory • Default bufhwm will be 102MB
monitor the buffer cache hit statistics • # sar -b 3 333 • SunOS zangief 5.7 Generic sun4u 06/27/99 • 22:01:51 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s • 22:01:54 0 7118 100 0 0 100 0 0 • 22:01:57 0 7863 100 0 0 100 0 0 • 22:02:00 0 7931 100 0 0 100 0 0 • 22:02:03 0 7736 100 0 0 100 0 0 • 22:02:06 0 7643 100 0 0 100 0 0 • 22:02:09 0 7165 100 0 0 100 0 0 • 22:02:12 0 6306 100 8 25 68 0 0 • 22:02:15 0 8152 100 0 0 100 0 0 • 22:02:18 0 7893 100 0 0 100 0 0
Outline • Introduction to File Caching • Page Cache and Virtual memory System • File System performance
file system caching behavior • physical memory is divided into pages • “pages in” a file • To read data from a file into memory, the virtual memory system reads in one page at a time • page scanner searches and puts LRU pages back on the free list
free Scan rate Page in file system caching behavior (Cont.)
File System Paging Optimizations • reduce the amount of memory pressure • invoke free-behind with sequential access • free pages when free memory falls to lotsfree • limit the file system’s use of the page cache • pages_before_pager, default 200 pages • reflects the amount of memory above the point where the page scanner starts (lotsfree) • when memory falls to 1.6 megabytes (on UltraSPARC) above lotsfree, the file system throttles back the use of the page cache
File System Paging Optimizations (Cont.) • memory falls to lotsfree + pages_before_pager • Solaris file systems free all pages after they are written • UFS and NFS enable free-behind on sequential access • NFS disables read-ahead • NFS writes synchronously, rather than asynchronously • VxFS enables free-behind (some versions only)
Outline • Introduction to File Caching • Page Cache and Virtual memory System • File System performance
Paging affects user’s application • page scanner puts too much pressure on user application’s private process memory • If scan rate is several hundred pages a second, the amount of time to check whether a page has been accessed falls to a few seconds. • any pages have not been used in the last few seconds will be taken • This behavior negatively affects application performance
Example • consider an OLTP application that makes heavy use of the file system • database is generating file system I/O, making the page scanner actively steal pages from the system. • user of the OLTP application has paused for 15 seconds to read the contents of a screen from the last transaction. • During this time, page scanner has found that those pages associated with the user application have not been referenced and makes them available for stealing. • The pages are stolen, when user types the next keystroke, he is forced to wait until the application is paged back in—usually several seconds. • Our user is forced to wait for an application to page in from the swap device, even though the application is running on a system with sufficient memory to keep all of the application in physical memory!
The priority paging algorithm • places a boundary around the file cache so that file system I/O does not cause unnecessary paging of applications • prioritizes the different types of pages in the page cache, in order of importance: • Highest — Pages associated with executables and shared libraries, including application process memory (anonymous memory) • Lowest — Regular file cache pages • as long as the system has sufficient memory, the scanner only steals pages associated with regular files
Enable priority paging • set the parameter priority_paging in /etc/system:set priority_paging=1 • To enable priority paging on a live 32-bit system, set the following with adb: # adb -kw /dev/ksyms /dev/mem lotsfree/D lotsfree: 730 <- value of lotsfree is printed cachefree/W 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal) dyncachefree/W 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal) cachefree/D cachefree: 1460 dyncachfree/D dyncachefree: 1460
Enable priority paging (Cont.) • To enable priority paging on a live 64-bit system, set the following with adb: # adb -kw /dev/ksyms /dev/mem lotsfree/E lotsfree: 730 <- value of lotsfree is printed cachefree/Z 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal) dyncachefree/Z 0t1460 <- insert 2x value of lotsfree preceded with 0t (decimal) cachfree/E cachefree: 1460 dyncachfree/E dyncachefree: 1460
Paging types • Execute bit associated with address space • executable files • regular files • paging types: executable, application, and file • memstat command • Output is similar to that of vmstat, but with extra fields to differentiate paging types • pi po fr sr • epi epf • api apo apf • fpi fpo fpf
paging caused by an application memory shortage # ./readtest testfile& # memstat 3 Memory ----------- paging ------ ---------executable- -- anonymous ------- -- filesys - --- cpu --- free re mf pi po fr de sr epi epo epf api apo apf fpi fpo fpf us sy wt id 2080 1 0 749 512 821 0 264 0 0 269 0 512 549 749 0 2 1 7 92 0 1912 0 0 762 384 709 0 237 0 0 290 0 384 418 762 0 0 1 4 94 0 1768 0 0 738 426 610 0 1235 0 0 133 0 426 434 738 0 42 4 14 82 0 1920 0 2 781 469 821 0 479 0 0 218 0 469 525 781 0 77 24 54 22 0 2048 0 0 754 514 786 0 195 0 0 152 0 512 597 754 2 37 1 8 91 0 2024 0 0 741 600 850 0 228 0 0 101 0 597 693 741 2 56 1 8 91 0 2064 0 1 757 426 589 0 143 0 0 72 8 426 498 749 0 18 1 7 92 0
paging through the file system • # ./readtest testfile& • # memstat 3 • memory ----------- paging ------------------ -executable - -anonymous - -- filesys -- ---- cpu ------ • free re mf pi po fr de sr epi epo epf api apo apf fpi fpo fpf us sy wt id • 3616 6 0 760 0 752 0 673 0 0 0 0 0 0 760 0 752 2 3 95 0 • 3328 2 198 816 0 925 0 1265 0 0 0 0 0 0 816 0 925 2 10 88 0 • 3656 4 195 765 0 792 0 263 0 0 0 2 0 0 762 0 792 7 11 83 0 • 3712 4 0 757 0 792 0 186 0 0 0 0 0 0 757 0 792 1 9 91 0 • 3704 3 0 770 0 789 0 203 0 0 0 0 0 0 770 0 789 0 5 95 0 • 3704 4 0 757 0 805 0 205 0 0 0 0 0 0 757 0 805 2 6 92 0 • 3704 4 0 778 0 805 0 266 0 0 0 0 0 0 778 0 805 1 6 93 0
Paging parameters affecting performance • When priority paging is enabled, the file system scan rate is higher. • High scan rates should not be used as a factor for determining memory shortage • If the file system activity is heavy, the scanner parameters are insufficient and will limit file system performance. • set the scanner parameters fastscan and maxpgioto to allow the scanner to scan at a high enough rate to keep up with the file system.
Scanner parameters • fastscan • the number of pages per second the scanner can scan. • defaults ¼ of memory per second, limited to 64 MB per second • limits file system throughput • when memory is at lotsfree,the scanner runs at half of fastscan, limited to 32 MB per second • If only 1/3 physical memory pages is a file page, the scanner will only be able to put 32 / 3 = 11MB per second of memory on the free list.
Scanner parameters (Cont.) • Maxpgio • the maximum number of pages the page scanner can push. • limits the write performance of the file system • If memory is sufficient, set maxpgio large, 1024 • Example: on a 4 GB machine • set fastscan=131072 • set handspreadpages=131072 • set maxpgio=1024
Direct I/O • unbuffered I/O,bypass file system page cache • UFS Direct I/O • allows reads and writes to files in a regular file system to bypass the page cache and access the file at near raw disk performance • be advantageous when accessing a file in a manner where caching is of no benefit • e.g., copying a very large file from one disk to another • eliminates the double copy that is performed when the read and write system calls are used • arranging for the DMA transfer to occur directly into the user’s address space
Enable direct I/O • Direct I/O will only bypass the buffer cache if all of the following are true • The file is not memory mapped. • The file is not on a logging file system. • The file does not have holes. • The read/write is sector aligned (512 byte) • enable direct I/O • mounting an entire file system with the forcedirectio mount option • # mount -o forcedirectio /dev/dsk/c0t0d0s6 /u1 • with the directio system call, on a per-file basis • int directio(int fildes, DIRECTIO_ON | DIRECTIO_OFF);
UFS direct I/O • Direct I/O can provide extremely fast transfers when moving data with big block sizes (>64 kB), but it can be a significant performance limitation for smaller sizes. • Structure ufs_directio_kstats, direct I/O statistics • struct ufs_directio_kstats { • uint_t logical_reads; /* Number of fs read operations */ • uint_t phys_reads; /* Number of physical reads */ • uint_t hole_reads; /* Number of reads from holes */ • uint_t nread; /* Physical bytes read */ • uint_t logical_writes; /* Number of fs write operations */ • uint_t phys_writes; /* Number of physical writes */ • uint_t nwritten; /* Physical bytes written */ • uint_t nflushes; /* Number of times cache was cleared */ • } ufs_directio_kstats;
Directory name Cache • caches path names for vnodes • DNLC, The Directory Name Lookup Cache • Each time we find the path name for a vnode, we store it in DNLC • Ncsize, system-tunable parameter, used to set the number of entries in the DNLC • is set at boot time • ncsize = (17 * maxusers) + 90 in Solaris 2.4, 2.5, 2.5.1 • ncsize = (68 * maxusers) + 360 in Solaris 2.6, 2.7 • Maxusers, equal to the number of megabytes of memory installed in the system, maximum of 1024, it can also be overridden to 2048 • Hit rate • the number of times a name was looked up and found in the name cache
Inode Caches • keep a number of inodes in memory • to minimize disk inode reads • to keep the inode’s vnode in memory • ufs_ninode, • size the tables for the expected number of inodes • affects the number of inodes in memory • how the UFS maintains inodes • Inodes are created when a file is first referenced • States: referenced, or on an idle queue • Are destroyed when pushed off the end of the idle queue
Inode Caches (Cont.) • The number of inodes in memory is dynamic • no upper bound to the number of inodes open at a time • the idle queue • When inode is no longer referenced, the inode is placed on the idle queue • its size is controlled by the ufs_ninode parameter and is limited to ¼ of ufs_ninode referred by other subsystem
Inode Caches (Cont.) # sar -v 3 3 SunOS devhome 5.7 Generic sun4u 08/01/99 11:38:09 proc-sz ov inod-sz ov file-sz ov lock-sz 11:38:12 100/5930 0 37181/37181 0 603/603 0 0/0 11:38:15 100/5930 0 37181/37181 0 603/603 0 0/0 11:38:18 101/5930 0 37181/37181 0 607/607 0 0/0 # netstat -k ufs_inode_cache ufs_inode_cache: buf_size 440 align 8 chunk_size 440 slab_size 8192 alloc 1221573 alloc_fail 0 free 1188468 depot_alloc 19957 depot_free 21230 depot_contention 18 global_alloc 48330 global_free 7823 buf_constructed 3325 buf_avail 3678 buf_inuse 37182 buf_total 40860 buf_max 40860 slab_create 2270 slab_destroy 0 memory_class 0 hash_size 0 hash_lookup_depth 0 hash_rescale 0 full_magazines 219 empty_magazines 332 magazine_size 15 alloc_from_cpu0 579706 free_to_cpu0 588106 buf_avail_cpu0 15 alloc_from_cpu1 573580 free_to_cpu1 571309 buf_avail_cpu1 25
Inode Caches (Cont.) • hash table used to look up inodes • Its size is controlled by the ufs_ninode • By default, ufs_ninode is set to the size of the directory name cache (ncsize) • set ufs_ninode separately in /etc/system • set ufs_ninode = new_value
End • Last.first@Sun.COM