310 likes | 323 Views
Shruti: A Self-Tuning Hierarchical Aggregation System. Praveen Yalagandula HP Labs Mike Dahlin University of Texas at Austin. Motivation. Distributed information aggregation Building block for many large-scale applications Examples: Resource scheduling, File location, Multicast, etc.
E N D
Shruti: A Self-Tuning Hierarchical Aggregation System Praveen Yalagandula HP Labs Mike Dahlin University of Texas at Austin
Motivation • Distributed information aggregation • Building block for many large-scale applications • Examples: Resource scheduling, File location, Multicast, etc. • An important issue: when do you aggregate? • Proactive (push): Aggregate on updates/writes • E.g.: Astrolabe, Ganglia • Reactive (pull): Aggregate on probes/reads • E.g.: MDS-2, Sophia • Hybrid: Aggregate partially on writes and complete on reads • E.g.: DHT based systems • SDIMS: First system with flexibility • But, Applications need to know read-write patterns a priori
Contributions • Shruti: Self-tuning aggregation system • Tune the aggregation aggressiveness • Based on observed read/write patterns • Goal: Minimize communication costs • Lease based technique • Maintain lease invariants for correct answers • Handle node and network failures • Optimization: Default up-lease initial state
Outline • Motivation for self-tuning aggregation • Background: SDIMS • Shruti: Architecture • Leases • Leasing policy • Default lease state • Reconfigurations • Evaluation • Summary
SDIMS f(f(a,b), f(c,d)) A2 • Hierarchical aggregation system • Physical machines are leaves • Virtual nodes groups • Attribute, value • E.g., (CPU,3Mhz), (Mem,2GB) • Aggregation function (f) • E.g., MAX, MIN, AVG, CONCAT • DHT based system for constructing multiple trees f(a,b) f(c,d) A1 A0 d c a b Praveen Yalagandula and Mike Dahlin, “SDIMS: A Scalable Distributed Information Management System”, SIGCOMM 2004
SDIMS: UP and DOWN knobs Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0
SDIMS: UP and DOWN knobs Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0
SDIMS: UP and DOWN knobs Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0
SDIMS: UP and DOWN knobs Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0
SDIMS: UP and DOWN knobs Application Developer Up=u Down=d Update-Up Up=all Down=0 Policy Setting Update-All Up=all Down=all Update-Local Up=0 Down=0
Shruti • Self-tunes aggregation aggressiveness • Goal: minimize communication cost • Number of messages for updates and probes • Tracks update and probes at each node • Decides when to send updates up or down • Employs a Lease-based architecture
Leases Update-Up Up=all Down=0 A B
Leases Update-Up Up=all Down=0 • A lease from node A to node B for an aggregate • Will forward all updates in future • So, B need not contact A on probes • Until B relinquishes OR B dies OR A revokes the lease A B
Lease: Invariants for correctness • A node can lease if and only if it already gets updates • Upward Path: A node can lease its local aggregate iff • It is a leaf or • It has leases from all its children • Downward Path: A node can lease to a child iff • It has lease from the parent Incorrect state Correct Incorrect state Correct
Leasing policy • When to grant and when to relinquish • Intuition: Useful to grant only if probe rate is more than the update rate • Costs per operation on a link • probe = 2 messages • update = 1 message • Grant a lease if probe rate >= 0.5*update rate • Else, relinquish Request Response Update
} AND only when invariants allow Shruti: Leasing Policy • Set and release based on number of messages observed • Policy defined with two knobs • setThresh • relThresh • Set a lease • If #probes since last update >= setThresh • Relinquish a lease • If #updates since last probe >= relThresh
Example with two nodes • setThresh=1, relThresh=2 B A Update Probe Number of probes since last update = 1 == setThresh Response Response + LEASE Probe Response Update Probe Response Update Update Number of updates since last probe = 2 == relThresh Relinquish
Default initial lease state • Default = no leases Initial probes cost O(N) • Common case: Sparse attributes • Only few nodes are interested in an attribute • Examples: File location, Multicast, etc., • Default initial state: Start with leases up to the root • Initial updates and probes incur O(log N) msgs
Handling failures • Goal: revert back to an invariant-satisfying state • Losing a child or the parent OK(No violations) • Acquiring a child OK(Default lease state assumption) • Acquiring a new parent Can violate invariants • A solution: Revoke leases Violates invariants X X
Outline • Motivation for self-tuning aggregation • Background: SDIMS • Shruti: Architecture • Leases • Leasing policy • Default lease state • Reconfigurations • Evaluation • Summary
Evaluation • Simulation experiments • 512 node system initialized with [attribute=“dummy”, value=0] • Aggregation • Function: summation operation • Update: increments value of the attribute • Probe: global aggregate aka sum of values at all nodes • Cases • Uniform read-write ratio across nodes • Spatial Heterogeneity: zipf-like distribution across nodes • Temporal Heterogeneity: varying read-write rates with time • Failure handling
Uniform RW ratio across nodes Simulation with 512 nodes Update-None Update-All Up=all, Down=3 Up=3, Down=0 Avg Message Count Update-Up Shruti Read-to-write ratio
Uniform RW ratio across nodes Update-None Up=3, Down=0 Shruti Update-Up Up=all, Down=3 Update-All
Uniform RW ratio across nodes Update-None Up=3, Down=0 Update-All Update-Up Up=all, Down=3 Shruti
Spatial heterogeneity Zipf-like distribution across nodes Update-None Update-All Up=all, Down=3 Up=3, Down=0 Update-Up Shruti
Temporal heterogeneity Three Phases (reads:writes) 100:1 1:100 1:100 Shruti(reads) Static(reads) Static(writes) Shruti(writes)
Temporal heterogeneity 1:100 1:100 100:1 Three Phases (reads:writes)
Failure handling (1024 node system) Start with NO leases set Root node fails All leases set towards the prober
Summary • Shruti: Self-tuning hierarchical aggregation system • Goal: Minimize communication costs • Lease based mechanism • Satisfy invariants to ensure consistency in the results • Default lease state for sparse attributes • Revert to invariant-satisfying state on failures • Applicable to more general aggregation over spanning trees • [Plaxton et al IPDPS’07] prove the competitive ratio with optimal offline algorithm and consistency properties
Multiple attributes: Uniform Writes, Zipf-like Distribution of Reads