Locality Optimizations in OceanStore

Locality Optimizations in OceanStore An introduction to introspective techniques for exploiting locality in wide area storage utilities. Patrick R. Eaton Dennis Geels

Agenda • OceanStore Review • Problem Overview • Previous Work • Proposed Solution • Prefetching Algorithm • Preliminary Results • Future Work

OceanStore Review • Properties of OceanStore relevant to introspective locality optimizations • implemented in the extremely wide area • has many places to put any single piece of data • cannot rely on users to make relationships among data explicit • depends on effective locality optimizations for improved performance • No possible way to solve exactly

Problem Overview • Passively observe data accesses • data shared among multiple users • single users accessing the network from different physical locations • data is replicated across the network • Optimize the location of data to provide quicker access to users • cluster semantically related data • replicate data to move it closer to consumers • migrate primary replicas toward the source of updates

Measurable Attributes • File Temperature • A measure that indicates the frequency of access to the file • A hot file is frequently accessed • Semantic Distance (Kuenning) • Any measure that can quantify relationships between files on the range [0,) • Local distance relates one instance of a file access to another • Reference distance is an aggregate measure that summarizes all local distances for a pair of files • Typical measures use access order or timing information

Prefetching Techniques • Automatic Prefetching (Griffoen and Appleton) • construct a probability graph that records accesses which follow within a lookahead period • predict a prefetch when the chance of an access is above a tunable parameter • Context Modeling (Kroeger and Long) • record in a trie all access sequences which have been observed • maintain pointers to all nodes which represent current contexts • predict a prefetch when the chance of an access to a child of a current context is above a probability threshold

Our Approach • Exploit the ideas of semantic distance to compute relationships among data • Cluster data based on the observed relationships • Store a summary of these relationships with the data • Migrate (prefetch) files based on familiar patterns in the access stream • recognize higher order correlations as in context modeling • tolerate noise in the access stream

Motivation for Prefetching Algorithm A Y Many patterns can be predicted only by observation of higher-order correlation--combining several pieces of past history. K B Z A Other patterns can only be detected through identification and filtering of noise. B C

FHB y B w g o F w K Distance Table (B,F) w K (y,B) w g o F K (o,w) K General Prefetching Algorithm • Update • Record the most recent file accesses in the file history buffer (FHB) • Each time a new file S is accessed, extract all triples of the form (FHB(i), FHB(j)) S from the FHB and update in the second-order distance table • Predict • Each time a new file S is accessed, examine the distance table entries of (FHB(i), S) • Prefetch files that are predicted with confidence above a certain threshold • Problems • O(k2) work to update distance table • Noise infects distance table

y B w g o F K h y B w t o F w K y B w g w K p e Distance Table y B w g t B w g t o w g K t o Optimizations to the Prefetching Algorithm Indicative FHB’s • First-order distance table • Records files that are close, as measured by semantic distance • Allows reverse lookup • Use first-order distance tables to filter out irrelevant file relationships • Update only relevant entries in the second-order distance table • Search for predictions based on only relevant access pairs

Update Extract relevant triples by intersecting the FHB with the results from the reverse lookup in first-order tables FHB 1st Order Table 2nd Order Table y Q t u v R w x S Q a b R c d (Q,R) S a b d f R b S g h t (Q,w) b t S t d e R v (Q,v) t d e Check table for prediction Find parents of S Find parents of R Update table Find parents of R t x b y t Q u v R Prefetching Algorithm Example • Predict • Extract relevant doubles by intersecting the FHB with the results from the reverse lookup in the first-order tables • Prefetch if the second-order table predicts a future access with sufficient confidence

Preliminary Results (Local System)

Future Work • Retarget the simulations to model OceanStore • Continue to refine the prefetching algorithm • Examine the potential of higher order prefetching • Combine prefetching and clustering • Look for opportunities to test the ideas on different workloads

Locality Optimizations in OceanStore

Locality Optimizations in OceanStore

Presentation Transcript

Optimizations in XSLT

In Locality

In Locality

Locality Optimizations in cc-NUMA Architectures Using Hardware Counters and Dyninst

Pond: the OceanStore Prototype

Exploiting Locality in DRAM

Optimizations for Locality-Aware Structured Peer-to-Peer Overlays

Locality Optimizations in Tapestry

Exploiting Locality in DRAM

Locality Debates in Retrospect

What is OceanStore?

Geometry Optimizations

Locality

Locality

OceanStore

An OceanStore Retrospective

OceanStore

Pond: the OceanStore Prototype