360 likes | 491 Views
A Semantic-based Cache Replacement Algorithm for Mobile File Access. Sharun Santhosh and Weisong Shi Department of Computer Science Wayne State University weisong@wayne.edu http://mist.cs.wayne.edu. Motivation. The Future Staying connected anywhere, anytime will become a reality How ?
E N D
A Semantic-based Cache Replacement Algorithm for Mobile File Access Sharun Santhosh and Weisong Shi Department of Computer Science Wayne State University weisong@wayne.edu http://mist.cs.wayne.edu
Motivation • The Future • Staying connected anywhere, anytime will become a reality • How ? • Cable modem or DSL connection at home • High speed Ethernet network at work or school • Satellite network in the car • WiFi network at the airport or the neighborhood coffee shop • Challenges • Effectiveness - Adapt to the various underlying connectivity • Convenience - Adaptation should be transparent to the user • Security – secure access in resource constraint devices
Heterogeneous Environment 802.11a,b,g Local Area Network wLAN Bluetooth Personal Area Network (PAN) Wide Area Network (WAN) WirelessBridge LAN GPS <1Mbs • Access • Synchronization • 10 Meters WorkgroupSwitches GSM/CDMA <100Mbs 9.6 Kbit/s <2Mbs • Access • “hot spots” • LAN equivalent • Voice • SMS • e-Mail • Web browsing • mCommerce • Internet access • Document transfer • Low/high quality video
Adaptive Communication Optimization (Fractal) SemanticbasedCaching Our Solution CEGOR ClosE and Go, Open and Resume Connection View based Secure and Transparent Reconnection
Roadmap • Motivation • Caching • Semantic-based Caching • Simulation Results • Conclusion
Caching • Three basic steps involved in accessing data anywhere and anytime. • Retrieve the files from the server • Work on them locally • Write the changes back to the server • A Cache optimizes this process • Reduce frequency of disk operations performed • Reduce frequency of requests to the fileservers • Reducing network load • Problem being addressed • Minimization of Communication
Why Study Caching? • It has been studied extensively yet LRU is the most commonly used algorithm • Used in NFS, AFS, Sprite, CODA and most operating systems buffer caches • Why ? • It’s simple to implement. • Cache misses are acceptable in existing systems. • Number of files replaced do not matter • high hit ratio vs. # of replacement • But in a heterogeneous environment • Each miss implies additional communication • Storage of work (when in a weakly connected or disconnected state) • Cannot assume a reliable link exists with the server
Usage Scenario “Imagine a field engineer is accessing layout diagrams for a faulty electricity sub-station, half way through communications go down. A cache MISS may cause several minutes delay, perhaps longer, e.g., Which was the 10,000 volt cable?”
Goals • Caches for distributed file systems, that operate across heterogeneous networks must • Provide the hit rates of conventional caches that operate over homogenous networks • Minimize communication overhead, i.e., minimize replacements which mean increased file availability and
Our Approach to Caching • File access patterns aren’t random • A semantic relationship exists between two files in a file access sequence • User behavior • Program execution • We define and investigate two kinds of such relations • Inter-file relations • Intra-file relations • We introduce the notion of eviction index for each cached item
Outline • Motivation • Caching • Semantic-based Caching • Simulation Results • Conclusion
Inter-file relations Analysis of DFS traces
Inter-file relations An inter-file relationship exits between two files i and j, if i is the next file opened following j being closed. File j is called file i’s precursor. Xi - represents the number of times file i is accessed. Ti - represents the time since the last access to file i. Yj - represents the number of times file j precedes file i.
Intra-file relations • An intra-file relationship is said to exist between two files i and j if they are both open before they are closed. • Intra-file relations are based on shared time Si,jdefined below • Where O(i) and C(i) are the time at which file i was opened or closed respectively O C i C O j Sij
Intra-file relations Ti - represents the time since the last access to file i. Tj - represents the time since the last access to file j where j is open before i is closed. Si,j - represents the shared time of file i with respect to file j where i is closed before j. Stotal - represents the total shared time with all files that are open before i is closed
Workload DFS Traces from CMU were utilized during the simulation
Implementation • Seven replacement algorithms • RR – Round Robin • LRU – Least recently used • LFU – Least frequently used • GDS – Greedy dual size • INTER – based only on inter-file relations • INTRA – based on intra-file relations • Both – based on both intra and inter file relations • Varying cache sizes • 10KB, 25KB, 50KB, 100KB, 500KB … • Seven traces • Simulator maintains a cache (hashlist), open list (list of currently open files), close list (list of files that are closed).
Outline • Motivation • Caching • Semantic-based Caching • Simulation Results • Conclusion
INTER/BOTH Hit rates of all algorithms
INTRA INTER/BOTH Replace attempts of all algorithms
The Need For File System Tracing • Traces haven’t been collected periodically enough to reflect present day usage activity • Publicly available traces such as traces collected at the disk driver level or web proxy traces do not give us relevant information on file system workload.
Original Open Original Close Open Close System Call Table Trace Module New Open Data Logger System Call Interception USER SPACE KERNEL SPACE int fopen(char *name,char *mode) Standard Library
Analysis Summary • Most files were opened for less than a hundredth of a second • Majority of files are accessed only a few times. There is a small percentage of very popular files • Majority of files are less than 100KB in size. Large file can be very large (heavy tail) • Almost half the accesses repeat within a short period of initially occurring • File throughput has greatly increased due to presence of large files • Majority of files accessed have a unique predecessor
Summary • We have presented a semantic-based caching algorithm and shown that it performs better than conventional caching approaches in terms of hit ratio and byte hit ratio • We have also shown that it does this performing far fewer replacements • Compared to prevalent replacement strategies that ignore file relations and communication overhead, this approach would seem to better suit distributed file systems that operate across heterogeneous environments
Future Work • Collecting more state-of-the-art distributed file systems traces • Applying the cache replacement algorithm into a real wireless file system in computer-assisted surgery application • Investigating the idea into more general applications, such as mobile database access, etc.
Questions & Comments? weisong@wayne.edu http://mist.cs.wayne.edu