380 likes | 588 Views
Sleepers & Workaholics. Caching Strategies in Mobile Computing Dr. Daniel Barbará Dr. Tomasz Imielinski. About Me. Peter Rosegger 5th year Computer Science Specialization: Databases Graduation: December 2007. Sleepers & Workaholics. Caching Strategies in Mobile Computing
E N D
Sleepers & Workaholics Caching Strategies in Mobile Computing Dr. Daniel Barbará Dr. Tomasz Imielinski
About Me Peter Rosegger • 5th year Computer Science • Specialization: Databases • Graduation: December 2007
Sleepers & Workaholics Caching Strategies in Mobile Computing Dr. Daniel Barbará • Professor at George Mason University • Several patents associated with mobile caching Dr. Tomasz Imielinski • Professor at Rutgers University • Senior VP: Search Technology at Ask.com
1994 16 million cellular subscribers in US
The Future of Mobile Computing Use Habits: • Large # of users • Check weather, stocks, scores, etc. • Mobile between cells (& wireless networks) Hardware: • Low-powered palmtop machines • Poor battery life • Narrow bandwidth
The Future of Mobile Computing Query complex databases, but… • Frequently powered off to save battery • Frequently changing cells • Network traffic must be minimized to conserve bandwidth
Why Caching is Important Conserve: • COMPUTATIONAL RESOURCES • BATTERY LIFE • BANDWIDTH
Traditional Strategies Fail Server lacks knowledge of: • Which units are in its cell • Which units are powered ON Client caches cannot be tracked
The Solution Purpose of Sleepers & Workaholics: "…to propose a taxonomy of different cache invalidation strategies and study the impact of clients' disconnection times on their performance."
Strategies • Timestamps (TS) • Amnesic Terminals (AT) • Signatures (SIG) Control Strategy: • No Cache (NC)
Timestamps -Cache entries have timestamps -Synchronous, history based, uncompressed reports SERVER: Notify clients of identifiers of items changed within last w seconds CLIENT: For each item in cache: • If in report, purge from cache • If NOT in report, update timestamp to current time
Amnesic Terminals -Cache entries have identifiers -Synchronous, history based, uncompressed reports SERVER: Notify clients of identifiers of items changed within last w seconds CLIENT: For each item in cache: • If in report, purge from cache • If NOT in report, do nothing
Signatures -Checksums calculated over value of data to form Signature -Signatures combined using XOR -Synchronous, state based, compressed reports SERVER: Server broadcasts the set of combined signatures CLIENT: Item in cache is declared invalid if it belongs to “too many” unmatching signatures (suspected of being out of date)
Calculate THROUGHPUT for each strategy… L = time between invalidation report broadcasts W = bandwidth B = # bits in the broadcast (invalidation reports) # bits available for answering queries (cache misses) Analysis C
Analysis T = THROUGHPUT; queries per interval handled by the system h = cache hit rate, expressed [0, 1] b = # bits for a query b = # bits to answer a query Traffic (in bits) due to cache misses q a
Maximal Throughput Server knows: -What units are in the cell -What those units have in their caches Server can: -instantaneously notify units when an item changes
Maximal Hit Ratio The Hit Ratio achieved in ideal conditions:
No Caching -No invalidation report -No intervals
Signatures Consider the probability of false diagnosis: • Probability of a false positive • Probability of a false negative
Asymptotic Analysis Analyze throughput in extreme cases: • As probability of sleeping s0, s1 Analyze throughput as system parameters vary: • Database size • Update frequency • Bandwidth • Etc.
Workaholics Unit sleeps less and less: s0 • All hit ratios approach the same value • SIG lags behind TS and AT by a factor of BEST THROUGHPUT: • AT, because its report is the shortest
Sleepers Unit sleeps more and more: s1 • All hit ratios approach 0 BEST THROUGHPUT: • No Caching eventually wins as s becomes very large • For practical purposes, SIG is the best choice
Infrequent Updates Effectiveness as s ranges from 0 to 1
Increase Database Size & Bandwidth Effectiveness as s ranges from 0 to 1
Update Intensive Effectiveness as s ranges from 0 to 1
Increase Database Size & Bandwidth Effectiveness as s ranges from 0 to 1
Conclusions on Effectiveness Strategy depends on circumstances: • SIG is best for sleepers • TS is best for query-intensive scenarios, but… • AT is best for workaholics How can we improve effectiveness?
Relax: Consistency of the Cache Depending on data type, data may not need to be exact… EX: stocks, weather, etc. Makes shorter invalidation reports possible
How Do We Decide to Update? - Consider cached copies to be quasi-copies - Each quasi-copy has a coherency condition attached to it Coherency Conditions: Delay Condition - updated based on time Arithmetic Condition - updated based on differencebetween data and quasi-copy
Adaptive Invalidation Reports -Start with TS strategy Use algorithms to optimize strategy. Examples: • If an item is queried very often by units that sleep a lot, include it in reports for longer • If an item changes frequently, do not bother caching
Criticism • Units rarely powered down • Battery life better than predicted • Battery life does not dictate use • Units still lose reception frequently • Today’s most common “sleeper” condition -- explicitly excluded from definition in S&W • Bandwidth better than predicted
However… • Adjust “sleeper” to include lost reception • Caching is still important • Endless demand for computational resources • Endless demand for battery life • Endless demand for more bandwidth