1 / 37

Chapter 4 File System —— File System Cache

Chapter 4 File System —— File System Cache. Li Wensheng wenshli@bupt.edu.cn. Outline. Introduction to File Caching Page Cache and Virtual memory System File System performance. Introduction to File Caching. File Caching One of the most important features of a file system

veata
Download Presentation

Chapter 4 File System —— File System Cache

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 File System—— File System Cache • Li Wensheng • wenshli@bupt.edu.cn

  2. Outline • Introduction to File Caching • Page Cache and Virtual memory System • File System performance

  3. Introduction to File Caching • File Caching • One of the most important features of a file system • Unix file system caching is implemented in the I/O subsystem by keeping copies of recently read or written blocks in a block cache • Solaris, implemented in the virtual memory system

  4. The Old-Style Buffer Cache

  5. Solaris Page Cache • Page chahe • a new method of caching file system data • developed at Sun as part of the virtual memory • used by System V Release 4 Unix • now also used in Linux and Windows NT • major differences from the old caching method • it’s dynamically sized and can use all memory that is not being used by applications • it caches file blocks rather than disk blocks • The key difference is that the page cache is a virtual file cache rather than a physical block cache

  6. The Solaris Page Cache for internal file system data -- metadata items (direct/indirect blocks, inodes) for file data

  7. Block Buffer Cache • used for caching of inodes and file metadata • In old versions of Unix, fixed in size by nbuf • specified the number of 512-byte buffers • now also dynamically sized • can grow by nbuf, as needed, • until it reaches a ceiling specified by the bufhwm • By default, it is allowed to grow until it uses 2 percent of physical memory. • We can look at the upper limit for the buffer cache by using the sysdef command.

  8. sysdef command. • # sysdef • * • * Tunable Parameters • * • 7757824 maximum memory allowed in buffer cache (bufhwm) • 5930 maximum number of processes (v.v_proc) • 99 maximum global priority in sys class (MAXCLSYSPRI) • 5925 maximum processes per user id (v.v_maxup) • 30 auto update time limit in seconds (NAUTOUP) • 25 page stealing low water mark (GPGSLO) • 5 fsflush run rate (FSFLUSHR) • 25 minimum resident memory for avoiding deadlock (MINARMEM) • 25 minimum swapable memory for avoiding deadlock (MINASMEM)

  9. Buffer cache size needed • 300 bytes per inode • about 1 MB per 2 GB of files • Example • A DBS with 100 files, total 100GB of storage space • Access only 50GB at the same time • Need: • 100*300 bytes=30KB for inodes • 50/2*1MB=25MB for metadata (direct and indirect blocks) • On a system with 5GB of physical memory • Default bufhwm will be 102MB

  10. monitor the buffer cache hit statistics • # sar -b 3 333 • SunOS zangief 5.7 Generic sun4u 06/27/99 • 22:01:51 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s • 22:01:54 0 7118 100 0 0 100 0 0 • 22:01:57 0 7863 100 0 0 100 0 0 • 22:02:00 0 7931 100 0 0 100 0 0 • 22:02:03 0 7736 100 0 0 100 0 0 • 22:02:06 0 7643 100 0 0 100 0 0 • 22:02:09 0 7165 100 0 0 100 0 0 • 22:02:12 0 6306 100 8 25 68 0 0 • 22:02:15 0 8152 100 0 0 100 0 0 • 22:02:18 0 7893 100 0 0 100 0 0

  11. Outline • Introduction to File Caching • Page Cache and Virtual memory System • File System performance

  12. file system caching behavior • physical memory is divided into pages • “pages in” a file • To read data from a file into memory, the virtual memory system reads in one page at a time • page scanner searches and puts LRU pages back on the free list

  13. free Scan rate Page in file system caching behavior (Cont.)

  14. File System Paging Optimizations • reduce the amount of memory pressure • invoke free-behind with sequential access • free pages when free memory falls to lotsfree • limit the file system’s use of the page cache • pages_before_pager, default 200 pages • reflects the amount of memory above the point where the page scanner starts (lotsfree) • when memory falls to 1.6 megabytes (on UltraSPARC) above lotsfree, the file system throttles back the use of the page cache

  15. File System Paging Optimizations (Cont.) • memory falls to lotsfree + pages_before_pager • Solaris file systems free all pages after they are written • UFS and NFS enable free-behind on sequential access • NFS disables read-ahead • NFS writes synchronously, rather than asynchronously • VxFS enables free-behind (some versions only)

  16. Outline • Introduction to File Caching • Page Cache and Virtual memory System • File System performance

  17. Paging affects user’s application • page scanner puts too much pressure on user application’s private process memory • If scan rate is several hundred pages a second, the amount of time to check whether a page has been accessed falls to a few seconds. • any pages have not been used in the last few seconds will be taken • This behavior negatively affects application performance

  18. Example • consider an OLTP application that makes heavy use of the file system • database is generating file system I/O, making the page scanner actively steal pages from the system. • user of the OLTP application has paused for 15 seconds to read the contents of a screen from the last transaction. • During this time, page scanner has found that those pages associated with the user application have not been referenced and makes them available for stealing. • The pages are stolen, when user types the next keystroke, he is forced to wait until the application is paged back in—usually several seconds. • Our user is forced to wait for an application to page in from the swap device, even though the application is running on a system with sufficient memory to keep all of the application in physical memory!

  19. The priority paging algorithm • places a boundary around the file cache so that file system I/O does not cause unnecessary paging of applications • prioritizes the different types of pages in the page cache, in order of importance: • Highest — Pages associated with executables and shared libraries, including application process memory (anonymous memory) • Lowest — Regular file cache pages • as long as the system has sufficient memory, the scanner only steals pages associated with regular files

  20. Enable priority paging • set the parameter priority_paging in /etc/system:set priority_paging=1 • To enable priority paging on a live 32-bit system, set the following with adb: # adb -kw /dev/ksyms /dev/mem lotsfree/D lotsfree: 730 <- value of lotsfree is printed cachefree/W 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal) dyncachefree/W 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal) cachefree/D cachefree: 1460 dyncachfree/D dyncachefree: 1460

  21. Enable priority paging (Cont.) • To enable priority paging on a live 64-bit system, set the following with adb: # adb -kw /dev/ksyms /dev/mem lotsfree/E lotsfree: 730 <- value of lotsfree is printed cachefree/Z 0t1460 <- insert 2 x value of lotsfree preceded with 0t (decimal) dyncachefree/Z 0t1460 <- insert 2x value of lotsfree preceded with 0t (decimal) cachfree/E cachefree: 1460 dyncachfree/E dyncachefree: 1460

  22. Paging types • Execute bit associated with address space • executable files • regular files • paging types: executable, application, and file • memstat command • Output is similar to that of vmstat, but with extra fields to differentiate paging types • pi po fr sr • epi epf • api apo apf • fpi fpo fpf

  23. paging caused by an application memory shortage # ./readtest testfile& # memstat 3 Memory ----------- paging ------ ---------executable- -- anonymous ------- -- filesys - --- cpu --- free re mf pi po fr de sr epi epo epf api apo apf fpi fpo fpf us sy wt id 2080 1 0 749 512 821 0 264 0 0 269 0 512 549 749 0 2 1 7 92 0 1912 0 0 762 384 709 0 237 0 0 290 0 384 418 762 0 0 1 4 94 0 1768 0 0 738 426 610 0 1235 0 0 133 0 426 434 738 0 42 4 14 82 0 1920 0 2 781 469 821 0 479 0 0 218 0 469 525 781 0 77 24 54 22 0 2048 0 0 754 514 786 0 195 0 0 152 0 512 597 754 2 37 1 8 91 0 2024 0 0 741 600 850 0 228 0 0 101 0 597 693 741 2 56 1 8 91 0 2064 0 1 757 426 589 0 143 0 0 72 8 426 498 749 0 18 1 7 92 0

  24. paging through the file system • # ./readtest testfile& • # memstat 3 • memory ----------- paging ------------------ -executable - -anonymous - -- filesys -- ---- cpu ------ • free re mf pi po fr de sr epi epo epf api apo apf fpi fpo fpf us sy wt id • 3616 6 0 760 0 752 0 673 0 0 0 0 0 0 760 0 752 2 3 95 0 • 3328 2 198 816 0 925 0 1265 0 0 0 0 0 0 816 0 925 2 10 88 0 • 3656 4 195 765 0 792 0 263 0 0 0 2 0 0 762 0 792 7 11 83 0 • 3712 4 0 757 0 792 0 186 0 0 0 0 0 0 757 0 792 1 9 91 0 • 3704 3 0 770 0 789 0 203 0 0 0 0 0 0 770 0 789 0 5 95 0 • 3704 4 0 757 0 805 0 205 0 0 0 0 0 0 757 0 805 2 6 92 0 • 3704 4 0 778 0 805 0 266 0 0 0 0 0 0 778 0 805 1 6 93 0

  25. Paging parameters affecting performance • When priority paging is enabled, the file system scan rate is higher. • High scan rates should not be used as a factor for determining memory shortage • If the file system activity is heavy, the scanner parameters are insufficient and will limit file system performance. • set the scanner parameters fastscan and maxpgioto to allow the scanner to scan at a high enough rate to keep up with the file system.

  26. Scanner parameters • fastscan • the number of pages per second the scanner can scan. • defaults ¼ of memory per second, limited to 64 MB per second • limits file system throughput • when memory is at lotsfree,the scanner runs at half of fastscan, limited to 32 MB per second • If only 1/3 physical memory pages is a file page, the scanner will only be able to put 32 / 3 = 11MB per second of memory on the free list.

  27. Scanner parameters (Cont.) • Maxpgio • the maximum number of pages the page scanner can push. • limits the write performance of the file system • If memory is sufficient, set maxpgio large, 1024 • Example: on a 4 GB machine • set fastscan=131072 • set handspreadpages=131072 • set maxpgio=1024

  28. VM Parameters That Affect File Systems

  29. Direct I/O • unbuffered I/O,bypass file system page cache • UFS Direct I/O • allows reads and writes to files in a regular file system to bypass the page cache and access the file at near raw disk performance • be advantageous when accessing a file in a manner where caching is of no benefit • e.g., copying a very large file from one disk to another • eliminates the double copy that is performed when the read and write system calls are used • arranging for the DMA transfer to occur directly into the user’s address space

  30. Enable direct I/O • Direct I/O will only bypass the buffer cache if all of the following are true • The file is not memory mapped. • The file is not on a logging file system. • The file does not have holes. • The read/write is sector aligned (512 byte) • enable direct I/O • mounting an entire file system with the forcedirectio mount option • # mount -o forcedirectio /dev/dsk/c0t0d0s6 /u1 • with the directio system call, on a per-file basis • int directio(int fildes, DIRECTIO_ON | DIRECTIO_OFF);

  31. UFS direct I/O • Direct I/O can provide extremely fast transfers when moving data with big block sizes (>64 kB), but it can be a significant performance limitation for smaller sizes. • Structure ufs_directio_kstats, direct I/O statistics • struct ufs_directio_kstats { • uint_t logical_reads; /* Number of fs read operations */ • uint_t phys_reads; /* Number of physical reads */ • uint_t hole_reads; /* Number of reads from holes */ • uint_t nread; /* Physical bytes read */ • uint_t logical_writes; /* Number of fs write operations */ • uint_t phys_writes; /* Number of physical writes */ • uint_t nwritten; /* Physical bytes written */ • uint_t nflushes; /* Number of times cache was cleared */ • } ufs_directio_kstats;

  32. Directory name Cache • caches path names for vnodes • DNLC, The Directory Name Lookup Cache • Each time we find the path name for a vnode, we store it in DNLC • Ncsize, system-tunable parameter, used to set the number of entries in the DNLC • is set at boot time • ncsize = (17 * maxusers) + 90 in Solaris 2.4, 2.5, 2.5.1 • ncsize = (68 * maxusers) + 360 in Solaris 2.6, 2.7 • Maxusers, equal to the number of megabytes of memory installed in the system, maximum of 1024, it can also be overridden to 2048 • Hit rate • the number of times a name was looked up and found in the name cache

  33. Inode Caches • keep a number of inodes in memory • to minimize disk inode reads • to keep the inode’s vnode in memory • ufs_ninode, • size the tables for the expected number of inodes • affects the number of inodes in memory • how the UFS maintains inodes • Inodes are created when a file is first referenced • States: referenced, or on an idle queue • Are destroyed when pushed off the end of the idle queue

  34. Inode Caches (Cont.) • The number of inodes in memory is dynamic • no upper bound to the number of inodes open at a time • the idle queue • When inode is no longer referenced, the inode is placed on the idle queue • its size is controlled by the ufs_ninode parameter and is limited to ¼ of ufs_ninode referred by other subsystem

  35. Inode Caches (Cont.) # sar -v 3 3 SunOS devhome 5.7 Generic sun4u 08/01/99 11:38:09 proc-sz ov inod-sz ov file-sz ov lock-sz 11:38:12 100/5930 0 37181/37181 0 603/603 0 0/0 11:38:15 100/5930 0 37181/37181 0 603/603 0 0/0 11:38:18 101/5930 0 37181/37181 0 607/607 0 0/0 # netstat -k ufs_inode_cache ufs_inode_cache: buf_size 440 align 8 chunk_size 440 slab_size 8192 alloc 1221573 alloc_fail 0 free 1188468 depot_alloc 19957 depot_free 21230 depot_contention 18 global_alloc 48330 global_free 7823 buf_constructed 3325 buf_avail 3678 buf_inuse 37182 buf_total 40860 buf_max 40860 slab_create 2270 slab_destroy 0 memory_class 0 hash_size 0 hash_lookup_depth 0 hash_rescale 0 full_magazines 219 empty_magazines 332 magazine_size 15 alloc_from_cpu0 579706 free_to_cpu0 588106 buf_avail_cpu0 15 alloc_from_cpu1 573580 free_to_cpu1 571309 buf_avail_cpu1 25

  36. Inode Caches (Cont.) • hash table used to look up inodes • Its size is controlled by the ufs_ninode • By default, ufs_ninode is set to the size of the directory name cache (ncsize) • set ufs_ninode separately in /etc/system • set ufs_ninode = new_value

  37. End • Last.first@Sun.COM

More Related