Accurate and Efficient Replaying of File System Traces

Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005)USENIX Conference on File and Storage Technologies Presented by Hsu Hao Chen

Outline • Introduction • Design • Architecture • Reproduce original timing problem • Replayfs trace • Threads and their scheduling • Zero copying of data • File system caches • Implementation • Evaluation • Conclusions

Introduction • Trace replaying is useful for file system benchmarking, stress-testing, debugging, and forensics. • File system traces can be captured and replayed at different logical levels: • System calls • Virtual File system (VFS) • Network level for network file systems • Device driver

Design • Architecture(1/2) • Tracefs: replays traces captured using stackable file system

Design • Architecture(2/2) • Replayfs: VFS-level replayer

Design • Reproduce original timing problem If the treplayer > tuser then timeing and I/O rate could not be reproduced correctly

Design • System-call replayers problem: • User mode • Redundant data copying between user and kernel buffers • Page eviction is not completely controlled from the user level • Replaying processes can be preempted by other tasks • Some kernel are not preemptive and have long execution path

Design • Replayfs trace(1/4)

Design • Replayfs trace(2/4) • Tracefs • A trace captured by a tracer is often portable, descriptive, and verbose to offer as much information • Trace compiler • User mode program for conversion and optimization of the Traces raw traces • Splits the raw Tracefs trace into three components: • Command • Resource Allocation Table (RAT) • Buffer

Design • Replayfs trace(3/4)

Design • Replayfs trace(4/4) memory buffers are accessed for reading only because the information read from the disk is discarded.

Design • Threads and their scheduling • Replayfs issues requests to the lower file system on behalf of different threads • Resource contention (disk head repositioning, locks, etc) • Replayfs reuses threads if possible • pre-spin • Increase event precision (standard event timers 1ms) • Clock thread • CPU cycle counters

Design • Zero copying of data • there is no easy way a user-mode program can read data but avoid copying it to user space. • Use kernel-mode benefit • a data page that belongs to the trace file can be simply moved to the target le by just changing several pointers

Design • File system caches • Replaysfs supports three replaying modes for dealing with read operations • Current cache state • Replayfs calls all the captured buffer read operations • Original cache state • reads are invoked on the page level only for the pages that were not found in the cache during tracing. • Reads are not replayed at all

Implementation • Linux kernel and now both Tracefs and Replayfs can be used on either 2.4 or 2.6 Linux kernels. Kernel module Application program Kernel module

Evaluation • Test environment • 1.7GHz Pentium 4 machine with 1GB of RAM • system disk was a 30GB 7200 RPM IDE formatted with Ext3 • the machine had two Maxtor Atlas 15,000 RPM 18.4GB Ultra320 SCSI disks formatted with Ext2 • storing the traces and the Replayfs traces

Evaluation • Evaluation Tools and Workloads • Am-utils build • Building Am-utils is a CPU-intensive benchmark • Postmark • simulates the operation of electronic mail servers • Pread • evaluate Replayfs's CPU time consumption. • It spawns two threads that concurrently read 1KB buffers of cached data using the pread system call. • Pread performed 100 million read operations.

Evaluation • Memory Overheads 56% 70% 45%

Evaluation • Timing Precision of Replaying(1/2) Number of operations Time (seconds) Time (seconds)

Evaluation • Timing Precision of Replaying(2/2) Number of operations Time (seconds) Time (seconds)

Evaluation • CPU Time Consumption 32% 61% User-level replayers cannot replay traces like Pread at the same rate as the original

Conclusions • Trace replaying offers a number of advantages for file system benchmarking, debugging, and forensics • Replaying has three distinct benefits: • Capture and replay all file system operations • Include important memory-mapping • Kernel module • Avoid unnecessary data copying • reduce the number of context switches • Optimize trace data prefetch • Precise control over thread scheduling • Pre-spin

Accurate and Efficient Replaying of File System Traces

Accurate and Efficient Replaying of File System Traces

Presentation Transcript

Fast and efficient log file compression

Classification.NET: Efficient and Accurate Classification in C#

Archives and traces

Efficient and accurate surveying with GPS/Glonass

Accurate and Efficient Accreditation Documentation Preparation

Efficient and accurate algorithms for peptide mass spectrometry

File System and File System Filter Ecosystem Update

File System of Unix

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

Deterministic Logging/Replaying of Applications

Abstracting the Content of System Call Traces

Replaying ADCP Data

CapProbe: An Efficient and Accurate Capacity Estimation Technique

File Layer and Virtual File System

Accurate, Efficient, and Adaptive Calling Context Profiling

Accurate And Efficient SLA Compliance Monitoring

Increasing the probability of selecting efficient and accurate coders

Storage Structure and Efficient File Access

distributed file system and google file system

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification

TRACES TRAde Control and Expert System

Efficient and Accurate Accounting Tax Prep