400 likes | 523 Views
The New Yorker, September 6, 1999, page 76. File System Usage in Windows NT 4.0. Werner Vogels Dept. of Computer Science Cornell University. Before. After. Goals of the study. Create a new data point with respect to the BSD/Sprite traces.
E N D
File System Usage in Windows NT 4.0 Werner Vogels Dept. of Computer Science Cornell University
Before After
Goals of the study • Create a new data point with respect to the BSD/Sprite traces. • Perform a rigorous statistical analysis of the trace data. • Study behavior of Windows NT File I/O components. • Investigate the complexity of Windows NT operations.
Quiz Number of files on a local file system? Number of files added per day? Most active directory? 75% of files is open is less then ? Cache read ahead size set by NTFS? Which percentage of the sequential reads is satisfied by a single read ahead?
Top 10 observations on Window NT file system usage 10. Using commercial data-mining tools for experimental data analysis was a big win.
4 groups of users 45 workstations 24 days of continuous tracing 1042 valid trace files 195 idle trace days 237 million trace records 31 million open requests 2.9 million failed open 410 GByte data requested 315 processes 7 million WinLogon 289 file types 3.4 million gifs Some numbers …
Top 10 observations on Window NT file system usage 9. Executable, DLL’s & fonts dominate the local File System content. 10. Using commercial data-mining tools was very useful.
Observations on file system content Mandatory reading: Douceur and Bolosky, SIGMETRICS’99 • “C:\” typically holds 24,000 to 45,000 files • File type distribution is highly variant • File size distributions are identical • File type weighted by size are similar: • Dominated by executables, DLL’s, fonts, etc. • Shifts only in extreme cases
One more cookie … File type count
Top 10 observations on Window NT file system usage 8. The WWW cache is the hot spot in the local File System 9. Executable, DLL’s & fonts dominate the local FS 10. Using commercial data-mining tools was very useful. .
Observations on file system content – II • Differences are in the profile tree • Downloaded from central server per user • Hot spot is the WWW cache in the profile • 2000 – 9500 files, 5 – 45 Mbytes • Changes over time, daily pattern: • 300 – 500 files added to the system (up to 3000) • 93% in the WWW cache
Top 10 observations on Window NT file system usage 7. On average files are open for longer periods. 8. The WWW cache is the hot spot in the local FS. 9. Executable, DLL’s & fonts dominate the local FS. 10. Commercial data-mining tools are very useful.
Open request arrivals 40% within 1 msec 90% within 30 msec Open times 40% less than 1 msec 90% less than 10 second – data 1 second - average 20 msec - control Strong heavy-tail Variance is high Mainly depends on process, not on type Create, Cleanup & Close
10 observations on Window NT file system usage 6. What are all those #$%@ control operations about? 7. Files are open for increasingly shorter periods 8. The WWW cache is the hot spot in the local FS 9. Executable, DLL’s & fonts dominate the local FS 10. Using commercial data-mining tools was very useful.
File Control Operations 74% of the open requests is to perform control operation • 33 different major requests • Many originates in the runtime libraries (volume mounted). • Some are triggered by system components (SetEndOfFile). • Control operations can only be made on open files.
Top 10 observations on Window NT file system usage 5. The FastIo path is extremely important
The importance of FASTIO • Procedural interface with 27 methods. • Provides a direct path to reading and writing of data directly from/to the cache. • Packet-based IO setups of the cache after which FastIO takes over.
Top 10 observations on Window NT file system usage 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.
Create & overwrite (37%) 75% overwritten within 4 milliseconds after Create 75% overwritten within 0.7 milliseconds after Close 94% of the processes that Create also overwrite. Life-time of new files 80% of the newly created files is deleted within 4 seconds (30 seconds in Sprite) • Create & delete (62%) • 72% deleted within 4 seconds after Create • 60 % deleted within 1.5 seconds after close • 36% of the processes that create also perform delete. • 18% is opened multiple between create and delete
Top 10 observations on Window NT file system usage 3. User activity and file access patterns appear to have changed less prominently. 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.
Top 10 observations on Window NT file system usage 2. Life isn’t a simple Poisson process … 3. User activity and file access patterns appear to have changed less prominently. 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.
File access patterns revisited Table 3: access patterns
File access patterns revisited Table 3: access patterns
Arrival rate of open requests in trace sample #239 Synthesized sample assuming a Poisson process Process assumptions
Top 10 observations on Window NT file system usage 1. Black box analysis does not lead to relevant insights. 2. Life isn’t a simple Poisson process … 3. User activity and file access patterns appear to have changed less prominently. 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.
The failure of Black Box data analysis It is wrong to assume that all trace data combined can be seen as one unified trace representing the behavior of a single Windows NT workstation.
The failure of Black Box data analysis - II • No statistical proof that any two non-idle traces draw their values from the same distribution. • The best result possible was that values come from the same type of distribution. • Using traditional statistical fitting of a predefined model to large amounts of trace data does not lead to any real insights “on average each open requests transfers 7KBytes of data”
The origin of the complexity • No more “well behaved” applications. • Too many variants. • Pervasive presence of heavy-tails in the distributions of almost all variables. (tail estimators indicate infinite variance). • Closed loop processing amplifies the presence of heavy-tail distributions found the file system content.
Last words Why you should read the paper … • The paper reports on an initial analysis of the result of a large scale file system usage study. • Detailed description of the methods used for tracing and analysis. • Historical comparison with BSD & Sprite traces. • Lots of data on file system content, cache behavior, IO specifics, etc. • Analysis of the presence of heavy tailed distributions in all of the traced variables.