1 / 37

The New Yorker, September 6, 1999, page 76

The New Yorker, September 6, 1999, page 76. File System Usage in Windows NT 4.0. Werner Vogels Dept. of Computer Science Cornell University. Before. After. Goals of the study. Create a new data point with respect to the BSD/Sprite traces.

madison
Download Presentation

The New Yorker, September 6, 1999, page 76

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The New Yorker, September 6, 1999, page 76

  2. File System Usage in Windows NT 4.0 Werner Vogels Dept. of Computer Science Cornell University

  3. Before After

  4. Goals of the study • Create a new data point with respect to the BSD/Sprite traces. • Perform a rigorous statistical analysis of the trace data. • Study behavior of Windows NT File I/O components. • Investigate the complexity of Windows NT operations.

  5. Quiz Number of files on a local file system? Number of files added per day? Most active directory? 75% of files is open is less then ? Cache read ahead size set by NTFS? Which percentage of the sequential reads is satisfied by a single read ahead?

  6. Top 10 observations on Window NT file system usage 10. Using commercial data-mining tools for experimental data analysis was a big win.

  7. 4 groups of users 45 workstations 24 days of continuous tracing 1042 valid trace files 195 idle trace days 237 million trace records 31 million open requests 2.9 million failed open 410 GByte data requested 315 processes 7 million WinLogon 289 file types 3.4 million gifs Some numbers …

  8. Top 10 observations on Window NT file system usage 9. Executable, DLL’s & fonts dominate the local File System content. 10. Using commercial data-mining tools was very useful.

  9. Observations on file system content Mandatory reading: Douceur and Bolosky, SIGMETRICS’99 • “C:\” typically holds 24,000 to 45,000 files • File type distribution is highly variant • File size distributions are identical • File type weighted by size are similar: • Dominated by executables, DLL’s, fonts, etc. • Shifts only in extreme cases

  10. One more cookie … File type count

  11. Top 10 observations on Window NT file system usage 8. The WWW cache is the hot spot in the local File System 9. Executable, DLL’s & fonts dominate the local FS 10. Using commercial data-mining tools was very useful. .

  12. Observations on file system content – II • Differences are in the profile tree • Downloaded from central server per user • Hot spot is the WWW cache in the profile • 2000 – 9500 files, 5 – 45 Mbytes • Changes over time, daily pattern: • 300 – 500 files added to the system (up to 3000) • 93% in the WWW cache

  13. Top 10 observations on Window NT file system usage 7. On average files are open for longer periods. 8. The WWW cache is the hot spot in the local FS. 9. Executable, DLL’s & fonts dominate the local FS. 10. Commercial data-mining tools are very useful.

  14. Open request arrivals 40% within 1 msec 90% within 30 msec Open times 40% less than 1 msec 90% less than 10 second – data 1 second - average 20 msec - control Strong heavy-tail Variance is high Mainly depends on process, not on type Create, Cleanup & Close

  15. 10 observations on Window NT file system usage 6. What are all those #$%@ control operations about? 7. Files are open for increasingly shorter periods 8. The WWW cache is the hot spot in the local FS 9. Executable, DLL’s & fonts dominate the local FS 10. Using commercial data-mining tools was very useful.

  16. File Control Operations 74% of the open requests is to perform control operation • 33 different major requests • Many originates in the runtime libraries (volume mounted). • Some are triggered by system components (SetEndOfFile). • Control operations can only be made on open files.

  17. Top 10 observations on Window NT file system usage 5. The FastIo path is extremely important

  18. …and somebody should document this soon …

  19. The importance of FASTIO • Procedural interface with 27 methods. • Provides a direct path to reading and writing of data directly from/to the cache. • Packet-based IO setups of the cache after which FastIO takes over.

  20. Top 10 observations on Window NT file system usage 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.

  21. Create & overwrite (37%) 75% overwritten within 4 milliseconds after Create 75% overwritten within 0.7 milliseconds after Close 94% of the processes that Create also overwrite. Life-time of new files 80% of the newly created files is deleted within 4 seconds (30 seconds in Sprite) • Create & delete (62%) • 72% deleted within 4 seconds after Create • 60 % deleted within 1.5 seconds after close • 36% of the processes that create also perform delete. • 18% is opened multiple between create and delete

  22. Top 10 observations on Window NT file system usage 3. User activity and file access patterns appear to have changed less prominently. 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.

  23. User activity

  24. File access patterns - counts

  25. File access patterns - bytes

  26. Top 10 observations on Window NT file system usage 2. Life isn’t a simple Poisson process … 3. User activity and file access patterns appear to have changed less prominently. 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.

  27. File access patterns revisited Table 3: access patterns

  28. File access patterns revisited Table 3: access patterns

  29. 10 minute throughput revisited

  30. Arrival rate of open requests in trace sample #239 Synthesized sample assuming a Poisson process Process assumptions

  31. Top 10 observations on Window NT file system usage 1. Black box analysis does not lead to relevant insights. 2. Life isn’t a simple Poisson process … 3. User activity and file access patterns appear to have changed less prominently. 4. The life-time expectation of new files has decreased by an order of magnitude. 5. The FastIo path is extremely important.

  32. The failure of Black Box data analysis It is wrong to assume that all trace data combined can be seen as one unified trace representing the behavior of a single Windows NT workstation.

  33. The failure of Black Box data analysis - II • No statistical proof that any two non-idle traces draw their values from the same distribution. • The best result possible was that values come from the same type of distribution. • Using traditional statistical fitting of a predefined model to large amounts of trace data does not lead to any real insights “on average each open requests transfers 7KBytes of data”

  34. The origin of the complexity • No more “well behaved” applications. • Too many variants. • Pervasive presence of heavy-tails in the distributions of almost all variables. (tail estimators indicate infinite variance). • Closed loop processing amplifies the presence of heavy-tail distributions found the file system content.

  35. Last words Why you should read the paper … • The paper reports on an initial analysis of the result of a large scale file system usage study. • Detailed description of the methods used for tracing and analysis. • Historical comparison with BSD & Sprite traces. • Lots of data on file system content, cache behavior, IO specifics, etc. • Analysis of the presence of heavy tailed distributions in all of the traced variables.

More Related