560 likes | 755 Views
Scale and Performance in a Distributed File System. John H. Howard et al. ACM Transactions on Computer Systems, 1989 Presented by Gangwon Jo, Sangkuk Kim. Andrew File System. Andrew Distributed computing environment for Carnegie Mellon University
E N D
Scale and Performancein a Distributed File System John H. Howard et al. ACM Transactions on Computer Systems, 1989 Presented by Gangwon Jo, Sangkuk Kim
Andrew File System • Andrew • Distributed computing environment for Carnegie Mellon University • 5,000 – 10,000 Andrew workstations in CMU • Andrew File System • Distributed file system for Andrew • Files are distributed across multiple servers • Presents a homogeneous file name space to all the client workstations
Andrew File System (contd.) Servers Disks. Disks. Disks. Unix Kernel Unix Kernel Unix Kernel Vice Vice Vice Network Clients User Prog. Venus User Prog. Venus User Prog. Venus Unix Kernel Unix Kernel Unix Kernel Disk. Disk. Disk.
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks Unix Kernel Vice Network User Program Venus Unix Kernel Disk
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus open(A) Unix Kernel Disk
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus open(A) Unix Kernel Disk
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus open(A) Unix Kernel Disk
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus open(A) Unix Kernel Disk A
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus read/write Unix Kernel Disk A
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus read/write Unix Kernel Disk A
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus close(A) Unix Kernel Disk A’
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A Unix Kernel Vice Network User Program Venus close(A) Unix Kernel Disk A’
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A A’ Unix Kernel Vice Network User Program Venus close(A) Unix Kernel Disk A’
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A’ Unix Kernel Vice Network User Program Venus open(A) Unix Kernel Disk A’
Andrew File System (contd.) • Design goal: Scalability • As much work as possible is performed by Venus • Solution: Caching • Venus caches files from Vice • Venus contacts Vice only when a file is opened or closed • Reading and writing are performed directly on the cached copy Disks A’ Unix Kernel Vice Network User Program Venus open(A) Unix Kernel Disk A’
Outline • Building a prototype • Qualitative Observation • Performance Evaluation • Changes for performance • Performance Evaluation • Comparison with a Remote-Open File System • Change for operability • Conclusion
Outline • Building a prototype • Qualitative Observation • Performance Evaluation • Changes for performance • Performance Evaluation • Comparison with a Remote-Open File System • Change for operability • Conclusion
The Prototype • Preserve directory hierarchy • Each server contained a directory hierarchy mirroring the structure of the Vice files a/ a1 a2 b/ b1 b2 c/ c1/ c11 c12 c2 Server Disks Client Disk Vice .admin/ a/ a1 a2 b/ → Server 2 c/ c1/ → Server 3 c2 Venus File Cache Status Cache ....
The Prototype (contd.) • Preserve directory hierarchy • Each server contained a directory hierarchy mirroring the structure of the Vice files a/ a1 a2 b/ b1 b2 c/ c1/ c11 c12 c2 Server Disks Client Disk Vice .admin/ a/ a1 a2 b/ → Server 2 c/ c1/ → Server 3 c2 .admin directories: contain Vice file status information Venus File Cache Stub directories: represent portions located on other servers Status Cache ....
The Prototype (contd.) • Preserve directory hierarchy • Vice-Venus interface name files by their full pathname Server Disks Client Disk Vice .admin/ a/ a1 a2 b/ → Server 2 c/ c1/ → Server 3 c2 Venus a/a1 File Cache Status Cache ....
The Prototype (contd.) • Dedicated processes • One process for each client Server Disks Client Disk Vice .admin/ a/ a1 a2 b/ → Server 2 c/ c1/ → Server 3 c2 Venus File Cache Status Cache ....
The Prototype (contd.) • Use two caches • One for files, and the other for status information about files Server Disks Client Disk Vice .admin/ a/ a1 a2 b/ → Server 2 c/ c1/ → Server 3 c2 Venus File Cache Status Cache ....
The Prototype (contd.) • Verify cached timestamp for each open • Before using a cached file, Venus verify its timestamp with that on the server Server Disks Client Disk Vice .admin/ a/ a1 a2 b/ → Server 2 c/ c1/ → Server 3 c2 Venus a/a1(5)? File Cache a/a1 (5) OK Status Cache ....
Qualitative Observation • stat primitive • Testing the presence of files, obtaining status information, ... • Programs using stat run much slower than the authors expected • Each stat involve a cache validity check • Dedicated processes • Excessive context switching overhead • High virtual memory paging demands • File location • Difficult to move users’ directories between servers
Performance Evaluation • Experience: the prototype was used in CMU • The authors + 400 other users • 100 workstations and 6 servers • Benchmark • A command script for source files • MakeDir→ Copy → ScanDir→ ReadAll→ Make • Multiple clients (load units) run the benchmark simultaneously
Performance Evaluation (contd.) • Cache hit ratio • File cache: 81% • Status cache: 82%
Performance Evaluation (contd.) • Distribution of Vice calls in prototype on average
Performance Evaluation (contd.) • Server usage • CPU utilizations are up to 40% • Disk utilizations are less than 15% • Server loads are imbalanced
Performance Evaluation (contd.) • Benchmark performance • Time for TestAuth rises rapidly beyond a load 5
Performance Evaluation (contd.) • Caches work well! • We need to • Reduce the frequency of cache validity check • Reduce the number of server processes • Require workstations rather than the servers to do pathname traversals • Balance server usage by reassigning users
Outline • Building a prototype • Qualitative Observation • Performance Evaluation • Changes for performance • Performance Evaluation • Comparison with a Remote-Open File System • Change for operability • Conclusion
Changes for Performance • Cache management: use callback • Vice notifies Venus if a cached file or directory is modified by other workstation • Cache entries are valid unless otherwise notified • Verification is not needed • Each Vice and Venus maintain callback state information
Changes for Performance (contd.) • Name resolution and storage representation • CPU overhead is caused by namei routine • Maps a pathname to an inode • Indicate files by fidsinstead of pathnames • Volume is a collection of files located on one server • Contains multiple vnodes which indicate files in the volume • Uniquifier allows reuse of vnode numbers Volume number Vnode number Uniquifier 32bit 32bit 32bit
Changes for Performance (contd.) • Name resolution and storage representation Clients Volume number Vnode number Uniquifier Servers
Changes for Performance (contd.) • Name resolution and storage representation Clients Volume number Vnode number Uniquifier Servers Volume location database
Changes for Performance (contd.) • Name resolution and storage representation Clients Volume number Vnode number Uniquifier Servers Vnode inode Volume location database Vnode lookup table
Changes for Performance (contd.) • Name resolution and storage representation • Indicate files by fidsinstead of pathnames • Each entry in a directory maps a component of a pathname to a fid • Venus performs the logical equivalent of a namei operation
Changes for Performance (contd.) • Server process structure • Use lightweight processes (LWPs) instead of processes • LWPs are not dedicated to a single client
Performance Evaluation • Scalability
Performance Evaluation (contd.) • Server utilization during benchmark
Outline • Building a prototype • Qualitative Observation • Performance Evaluation • Changes for performance • Performance Evaluation • Comparison with a Remote-Open File System • Change for operability • Conclusion
Comparison with A Remote-Open File System • The Caching of Andrew File System • Locality makes caching attractive • Whole-file transfer approach contacts servers only on opens and closes • Most files in a 4.2BSD environment are read in their entirety • Disk caches retain their entries across reboots • Caching of entire files simplifies cache management
Comparison with A Remote-Open File System • The Caching of Andrew File System – Drawbacks • Requiring local disks • Large file handling • Strict emulation of 4.2BSD concurrent read/write semantics is impossible
Comparison with A Remote-Open File System • Remote Open • The data in a file are not fetched en masse • Instead the remote site potentially participates in each individual read an write operation • File is actually opened on the remote site rather than the local site • NFS
Comparison with A Remote-Open File System • Serious functional problems with NFS at high loads Network Traffic for Andrew and NFS
Comparison with A Remote-Open File System • Advantage of remote-open file system • Low latency Latency of Andrew and NFS
Outline • Building a prototype • Qualitative Observation • Performance Evaluation • Changes for performance • Performance Evaluation • Comparison with a Remote-Open File System • Change for operability • Conclusion
Change for Operability • Volume • A collection of files forming a partial subtree of the Vice name space • Glued together at mount points • Operational Transparency Servers Volume 1 Volume 3 Mounted Volume Volume 2