550 likes | 638 Views
A Scalable Distributed Information Management System (SDIMS). P. Yalagandula, M. Dahlin cs.utexas.edu SIGCOMM 2004. Outline. Introduction Goal : Aggregation Innovation Flexibility Scalability Robustness Implementation Evaluation Conclusions. Introduction. Why SDIMS ?
E N D
A Scalable Distributed Information Management System (SDIMS) P. Yalagandula, M. Dahlin cs.utexas.edu SIGCOMM 2004
Outline • Introduction • Goal : Aggregation • Innovation • Flexibility • Scalability • Robustness • Implementation • Evaluation • Conclusions
Introduction • Why SDIMS ? • Monitor, querying, reacting to changes are core components of applications such as system management, service placement, data sharing and caching, etc. • SDIMS in a networked system would provide a distributed operating system backbone and facilitate the development and deployment of new distributed service.
Introduction (cont.) • Fundamental • Hierarchical aggregation • A node access detailed views of nearby information and summery views of global information. • A hierarchical system aggregate information through reduction trees.
Introduction (cont.) • A SDIMS should have four properties. • Scalable • Flexibility • Administrative isolation • Robustness
Scalable • SDIMS should accommodate large numbers of nodes. • SDIMS should allow applications to install and monitor large numbers of data attributes.
Flexibility • SDIMS should accommodate a range of applications and attributes. • Read-dominated attribute (rarely change) • Num of CPUs • Write-dominated attribute (change often) • Num of processes • SDIMS should leave the policy decision of tuning replication to applications.
Administrative isolation • Nodes can be arranged in an organizational or administrative hierarchy. • Domain-based control. • Monitor • Query
Robustness • SDIMS should adapt to reconfigurations in a timely fashion when node failures or disconnections. • SDIMS should provide mechanisms so that applications can tradeoff the cost of adaptation with consistency level of aggregated results when reconfigurations occur.
Related Works • Astrolabe • A single logical aggregation tree that mirrors a system administrative hierarchy. • A general interface for installing new aggregation functions. • An unstructured gossip protocol for disseminating information and replicating all aggregated attribute values for a sub-tree to all nodes in the sub-tree.
Related Works (cont.) • Any nodes can answer queries by using local information. • Not scalable. (replication) • Not flexibility. (Type of attribute) • Solution : P2P Go to DHT
Tree • For each level in the hierarchy, the agent maintains a record with the list of child zones (and their attributes), and which child zone represents its own zone (self). Back to Astrolabe
Gossip protocol • Periodically, each agent selects some other agent at random and exchanges state information with it. • If the two agents are in the same zone, the state exchanged relates to MIBs in that zone. • If the two agents are in different zone, they exchange state associated with the MIBs of their least common ancestor zone. Back to Astrolabe
Related Works (cont.) • DHT • SkipNet, CAN, Pastry, Chord, Tapestry
Problem • How to scalable map different attributes to different aggregation tree in a DHT mesh ?{physical network vs overlay network} • How to provide flexibility in the aggregation to accommodate different application requirement ?{flexible API for installing and controlling system}
Problem ? • How to adapt a DHT mesh to attain administrative isolation property ? {virtual organization} • How to provide robustness without unstructured gossip and total replication ?{cache; pre-computing or on-demand re-aggregation}
Aggregation Abstraction • Each physical node in the system is a leaf in the tree. • An internal non-leaf, which we call virtual node, is simulated by one or more physical nodes at the leaves of the sub-tree for which the virtual node is the root.
Aggregation Abstraction (cont.) • Each physical node has local data stored as a set of (attributeType, attributeName, value) tuples. • The system associates an aggregation function ftype with each attribute type.
Aggregation Abstraction (cont.) • For each level-i sub-tree Ti in the system has an aggregate valueVi, type, name for each (attributeType, attributeName) pair. • The aggregate value for a level-i sub-tree Ti is the aggregate function for the type, ftype computed across the aggregate values of each of Ti‘s k children.Vi, type, name = ftype
Aggregation Abstraction (cont.) • Example of ftype • Avg(V1, …, Vn)=1/n 錯誤 • SUM(V1, …, Vn) = 正確 • Aggregation function satisfy the hierarchical computation property
Aggregation Abstraction (cont.) node Virtual node
Innovation • Flexibility • Scalability • Administrative isolation • Robustness
Flexibility • Operation API • Install • Update • Prob
Install Operation • The Install operation installs an aggregation function in the system.
Prob Operation 使用於強制reconfigure,更新所有cache
Prob Operation (cont.) • When node A issues a continuous probe at level l for an attribute, then updates for the attribute at any node in A’s level-l ancestor’s subtree are aggregated up to level l and is propagated down along the path from the ancestor to A.
Update Operation API • Update-UpK-downj :Up to kth level and propagates the aggregate values of a node at level l downward for j levels. (l ≤ k)
Operation API K Update-UpK-downj Level-4 Level-3 L Level-2 J Level-1 Level-0
Dynamic Adaptation • A SDIMS implementation can dynamically adjust its up/down strategies for an attribute based on its measured read/write frequency.
Scalability • SDIMS defines the aggregation abstraction to mesh with its underlying scalable DHT system. • SDIMS refines the basic DHT abstraction to form an Autonomous DHT (ADHT) to achieve the administrative isolation properties
Mapping to DHT • Aggregating an attribute along the aggregation tree is corresponding to DHTtreek for k =hash(attribute type, attribute name) • Different attributes will be aggregated along different trees.
Administrative isolation • For security • Updates and Probes are not accessible outside the domain • For availability • Queries for values in a domain are not affected by failures of nodes in other domains • For efficiency • Domain-scoped queries can be simple and efficient.
Administrative isolation • Autonomous DHT • Path Locality: Search paths should always be contained in the smallest possible domain. • Path Convergence: Search paths for a key from different nodes in a domain should converge at a node in that domain.
應合併 Administrative isolation Domain univ. Domain dept. L0: host L2: univ. isolation property is violated
Administrative isolation Domain dept. Domain univ. L0: host L2: dept. Autonomous DHT
Robustness • ADHT • Distributed Computing (?) • Aggregation Management Layer (AML) • Lazy re-aggregation • On-demand Re-aggregation • Replication in Space
2 Layer arch. : ADHT and AML • The ADHT layer informs the AML layer about reconfigurations in the network. • NewParent • FailedChild • NewChild
Implementation DifferentOverlay(?)
MIB • Child MIBs containing raw aggregate values gathered from children. • Reduction MIB containing locally aggregated values across this raw information • Ancestor MIB containing aggregate values scattered down from ancestors.
Implementation parent child
Implementation (cont.) • attribute key : Use for retrieving data by aggregation function. • (attributetype, attribute name)
Implementation (cont.) • A node acts • as leaf for all attribute keys • as a level-1 subtree root for keys whose hash matches the node’s ID in b prefix bits. • as a level-i subtree root for keys whose hash matches the node’s ID in the initial i * b bits. • as the system’s global root for attribute keys whose hash matches the node’s ID in more prefix bits than any other node
Evaluation 更新自己的MIB 更新全部Node的MIB Up-All, Down 0 Monitor的attribute變化少 Monitor的attribute變化多
Evaluation (cont.) the session size is set to 8 (domain size), the branching factor is set to 16 Message size nodes
Evaluation (cont.) Bf: Branch Factor Average path length to root
Evaluation (cont.) Bf: Branch Factor