1 / 36

A Semantic-based Cache Replacement Algorithm for Mobile File Access

A Semantic-based Cache Replacement Algorithm for Mobile File Access. Sharun Santhosh and Weisong Shi Department of Computer Science Wayne State University weisong@wayne.edu http://mist.cs.wayne.edu. Motivation. The Future Staying connected anywhere, anytime will become a reality How ?

Download Presentation

A Semantic-based Cache Replacement Algorithm for Mobile File Access

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Semantic-based Cache Replacement Algorithm for Mobile File Access Sharun Santhosh and Weisong Shi Department of Computer Science Wayne State University weisong@wayne.edu http://mist.cs.wayne.edu

  2. Motivation • The Future • Staying connected anywhere, anytime will become a reality • How ? • Cable modem or DSL connection at home • High speed Ethernet network at work or school • Satellite network in the car • WiFi network at the airport or the neighborhood coffee shop • Challenges • Effectiveness - Adapt to the various underlying connectivity • Convenience - Adaptation should be transparent to the user • Security – secure access in resource constraint devices

  3. Heterogeneous Environment 802.11a,b,g Local Area Network wLAN Bluetooth Personal Area Network (PAN) Wide Area Network (WAN) WirelessBridge LAN GPS <1Mbs • Access • Synchronization • 10 Meters WorkgroupSwitches GSM/CDMA <100Mbs 9.6 Kbit/s <2Mbs • Access • “hot spots” • LAN equivalent • Voice • SMS • e-Mail • Web browsing • mCommerce • Internet access • Document transfer • Low/high quality video

  4. Adaptive Communication Optimization (Fractal) SemanticbasedCaching Our Solution CEGOR ClosE and Go, Open and Resume Connection View based Secure and Transparent Reconnection

  5. Roadmap • Motivation • Caching • Semantic-based Caching • Simulation Results • Conclusion

  6. Caching • Three basic steps involved in accessing data anywhere and anytime. • Retrieve the files from the server • Work on them locally • Write the changes back to the server • A Cache optimizes this process • Reduce frequency of disk operations performed • Reduce frequency of requests to the fileservers • Reducing network load • Problem being addressed • Minimization of Communication

  7. Why Study Caching? • It has been studied extensively yet LRU is the most commonly used algorithm • Used in NFS, AFS, Sprite, CODA and most operating systems buffer caches • Why ? • It’s simple to implement. • Cache misses are acceptable in existing systems. • Number of files replaced do not matter • high hit ratio vs. # of replacement • But in a heterogeneous environment • Each miss implies additional communication • Storage of work (when in a weakly connected or disconnected state) • Cannot assume a reliable link exists with the server

  8. Usage Scenario “Imagine a field engineer is accessing layout diagrams for a faulty electricity sub-station, half way through communications go down. A cache MISS may cause several minutes delay, perhaps longer, e.g., Which was the 10,000 volt cable?”

  9. Is simple caching (LRU) enough??

  10. Goals • Caches for distributed file systems, that operate across heterogeneous networks must • Provide the hit rates of conventional caches that operate over homogenous networks • Minimize communication overhead, i.e., minimize replacements which mean increased file availability and

  11. Our Approach to Caching • File access patterns aren’t random • A semantic relationship exists between two files in a file access sequence • User behavior • Program execution • We define and investigate two kinds of such relations • Inter-file relations • Intra-file relations • We introduce the notion of eviction index for each cached item

  12. Outline • Motivation • Caching • Semantic-based Caching • Simulation Results • Conclusion

  13. Inter-file relations Analysis of DFS traces

  14. Inter-file relations An inter-file relationship exits between two files i and j, if i is the next file opened following j being closed. File j is called file i’s precursor. Xi - represents the number of times file i is accessed. Ti - represents the time since the last access to file i. Yj - represents the number of times file j precedes file i.

  15. Intra-file relations • An intra-file relationship is said to exist between two files i and j if they are both open before they are closed. • Intra-file relations are based on shared time Si,jdefined below • Where O(i) and C(i) are the time at which file i was opened or closed respectively O C i C O j Sij

  16. Intra-file relations Ti - represents the time since the last access to file i. Tj - represents the time since the last access to file j where j is open before i is closed. Si,j - represents the shared time of file i with respect to file j where i is closed before j. Stotal - represents the total shared time with all files that are open before i is closed

  17. Inter + Intra

  18. Workload DFS Traces from CMU were utilized during the simulation

  19. Implementation • Seven replacement algorithms • RR – Round Robin • LRU – Least recently used • LFU – Least frequently used • GDS – Greedy dual size • INTER – based only on inter-file relations • INTRA – based on intra-file relations • Both – based on both intra and inter file relations • Varying cache sizes • 10KB, 25KB, 50KB, 100KB, 500KB … • Seven traces • Simulator maintains a cache (hashlist), open list (list of currently open files), close list (list of files that are closed).

  20. Structure of simulator

  21. Simulator pseudocode

  22. Outline • Motivation • Caching • Semantic-based Caching • Simulation Results • Conclusion

  23. INTER/BOTH Hit rates of all algorithms

  24. INTRA INTER/BOTH Replace attempts of all algorithms

  25. Performance – DFS Traces

  26. Performance – DFS Traces

  27. Performance

  28. The Need For File System Tracing • Traces haven’t been collected periodically enough to reflect present day usage activity • Publicly available traces such as traces collected at the disk driver level or web proxy traces do not give us relevant information on file system workload.

  29. Original Open Original Close Open Close System Call Table Trace Module New Open Data Logger System Call Interception USER SPACE KERNEL SPACE int fopen(char *name,char *mode) Standard Library

  30. Analysis Summary • Most files were opened for less than a hundredth of a second • Majority of files are accessed only a few times. There is a small percentage of very popular files • Majority of files are less than 100KB in size. Large file can be very large (heavy tail) • Almost half the accesses repeat within a short period of initially occurring • File throughput has greatly increased due to presence of large files • Majority of files accessed have a unique predecessor

  31. MIST traces – Hit Rates

  32. MIST Traces – Files Replaced

  33. MIST Traces –Byte Hit Rate

  34. Summary • We have presented a semantic-based caching algorithm and shown that it performs better than conventional caching approaches in terms of hit ratio and byte hit ratio • We have also shown that it does this performing far fewer replacements • Compared to prevalent replacement strategies that ignore file relations and communication overhead, this approach would seem to better suit distributed file systems that operate across heterogeneous environments

  35. Future Work • Collecting more state-of-the-art distributed file systems traces • Applying the cache replacement algorithm into a real wireless file system in computer-assisted surgery application • Investigating the idea into more general applications, such as mobile database access, etc.

  36. Questions & Comments? weisong@wayne.edu http://mist.cs.wayne.edu

More Related