390 likes | 520 Views
ACMS: The Akamai Configuration Management System. A. Sherman , P. H. Lisiecki , A. Berkheimer , and J. Wein Presented by Parya Moinzadeh. The Akamai Platform. Over 15,000 servers Deployed in 1200+ different ISP networks In 60+ countries. Motivation.
E N D
ACMS: The Akamai Configuration Management System A. Sherman, P. H. Lisiecki, A. Berkheimer, and J. Wein Presented by ParyaMoinzadeh
The Akamai Platform • Over 15,000 servers • Deployed in 1200+ different ISP networks • In 60+ countries
Motivation • Customers need to maintain close control over the manner in which their web content is served • Customers need to configure different options that determine how their content is served by the CDN • Need for frequent updates or “reconfigurations”
Akamai Configuration Management System (ACMS) • Supports configuration propagation management • Accepts and disseminates distributed submissions of configuration information • Availability • Reliability • Asynchrony • Consistency • Persistent storage
Problem • The widely dispersed set of end clients • At any point in time some servers may be down or have connectivity problems • Configuration changes are generated from widely dispersed places • Strong consistency requirements
Assumptions • The configuration les will vary in size from a few hundred bytes up to 100MB • Most updates must be distributed to every Akamai node • There is no particular arrival pattern of submissions • The Akamai CDN will continue to grow • Submissions could originate from a number of distinct applications running at distinct locations on the Akamai CDN • Each submission of a configuration file foo completely overwrites the earlier submitted version of foo • For each configuration file there is either a single writer or multiple idempotent (non-competing) writers
Requirements • High Fault-Tolerance and Availability • Efficiency and Scalability • Persistent Fault-Tolerant Storage • Correctness • Acceptance Guarantee • Security
Approach Small set of storage points The entire Akamai CDN
Architecture SP SP Storage Points Publishers SP SP SP Accepting SP Edge Servers
Quorum-based Replication • A quorum is defined as a majority of the ACMS SPs • Any update submission should be replicated and agreed upon by the quorum • A majority of operational and connected SPs should be maintained • Every future majority overlaps with the earlier majority that agreed on a file
Acceptance Algorithm the Accepting SP copies the update to at least a quorum of the SPs Vector exchange protocol
Acceptance Algorithm Continued • A publisher contacts an accepting SP SP Agreement algorithm ??? • The Accepting SP first creates a temporary file with a unique filename (UID) SP SP • The accepting SP sends this file to a number of SPs SP UID for a configuration file foo: “foo.A.1234” SP • If replication succeeds the accepting SP initiates an agreement algorithm called Vector Exchange • Upon success the accepting SP “accepts” and all SPs upload the new file
Acceptance Algorithm Continued • The VE vector is just a bit vector with a bit corresponding to each Storage Point • A 1-bit indicates that the corresponding Storage Point knows of a given update • When a majority of bits are set to 1, we say that an agreement occurs and it is safe for any SP to upload this latest update
Acceptance Algorithm Continued • “A” initiates and broadcasts a vector: • A:1 B:0 C:0 D:0 E:0 A B SP SP SP • “C” sets its own bit and re-broadcasts: • A:1 B:0 C:1 D:0 E:0 E SP • “D” sets its bit and rebroadcasts • A:1 B:0 C:1 D:1 E:0 SP C D Any SP learns of the “agreement” when it sees a majority of bits set.
Recovery • The recovery protocol is called Index Merging • SPs continuously run the background recovery protocol with one another • The downloadable configuration files are represented on the SPs in the form of an index tree • The SPs “merge” their index trees to pick up any missed updates from one another • The Download Points also need to sync up state
The Index Tree • Snapshot is a hierarchical index structure that describes latest versions of all accepted files • Each SP updates its own snapshot when it learns of a quorum agreement • For full recovery each SP needs only to merge in a snapshot from majority-1 other SPs • Snapshots are also used by the edge servers to detect changes
Data Delivery • Processes on edge servers subscribeto specific configurations via their local Receiver process • A Receiver checks for updates to the subscription tree by making HTTP IMS requests recursively • If the updates match any subscriptions the Receivers download the files
Evaluation The period from the time an Accepting SP is first contacted by a publishing application, until it replies with “Accept” • Workload of the ACMS front-end over a 48 hour period in the middle of a work week • 14,276 total file submissions on the system • Five operating Storage Points
Propagation Time Distribution • A random sampling of 250 Akamai nodes • The average propagation time is approximately 55 seconds
Propagation times for various size files The average time for each file to propagate to 95% of its recipients The average propagation time • Mean and 95th-percentile delivery time for each submission • 99.95% of updates arrived within three minutes • The remaining 0.05% were delayed due to temporary network connectivity issues
Discussion • Push-based vs. pull-based update • The effect of increasing the number of SP on the efficiency • The effect of having less number on nodes in the quorum • The effect of having a variable sized quorum • Consistency vs. availability trade-offs in quorum selection • How is a unique and synchronized ordering of all update versions of a given configuration file maintained? Can it be optimized? • Is VE expensive? Can it be optimized? • Can we optimize the index tree structure? • The trade-off of having great cacheability…
CS525 – Dynamo vs. Bigtable KeunSooYim April 21, 2009 • Dynamo: Amazon’s Highly Available Key-value StoreG. DeCandia et al. (Amazon), SOSP 2007. • Bigtable: A Distributed Storage System for Structured DataF. Chang et al. (Google), OSDI 2006.
Scalable Distributed Storage Sys. • RDBMS and NFS • High throughput and Scalability • High-availability vs. Consistency • Relational data processing model and security • Cost-effectiveness
Amazon Dynamo – Consistent Hashing • Node: Random Value Position in the ring • Data: Key Position in the ring • Interface • Get(Key) • Put(Key, Data, [Context])
Virtual Node for Load Balancing Physical Node • If a node fails,the load is evenly dispersed across the rest. • If a node joins,its virtual nodes accept a roughly equivalent amount of load from the rest. • How does it handle heterogeneity in nodes? # virtual nodes for a node is decided based on the node capacity
Replication for High Availability • Each data item is replicated at N hosts. • Key is stored in N-1 clockwise successors. • N is a per-instance configurable parameter
Quorum for Consistency • W + R > N • W: 2 • R: 2 Write Read Write Read Write Read Consistency Insurance Consistency Insurance Consistency Insurance • Slow Write • Write: 3 • Read: 1 • Ambiguous& Slow Read(cache)
Vector Clock for Eventual Consistency • Vector clock: a list of (node, cnt) • Client is asked for reconciliation. • Why reconciliation isnot likely to happen?
Latency of 99.9 Percentile Service Level Agreements (SLA)
Google’s Bigtable • Key: <Row, Column, Timestamp> • Rows are ordered lexicographically • Column = family:optional_qualifier • API: lookup, insert, and delete • No support for relational DBMS model
Tablet • Tablet (size: 100-200MB) • a set of adjacent rows • edu.illinois.cs/i.html, edu.illinois.csl/i, edu.illinois.ece/i.html • unit of distribution and load balancing • Each tablet lives at only one table server • Tablet server splits tablets that get too big Start:aardvark End:apple Tablet SSTable SSTable 64K block 64K block 64K block 64K block 64K block 64K block Index Index • Immutable, sorted file of key-value pairs
System Organization <Clients> <Tablet Server 1> BigTable Client BigTable Server GFS Chunk Server Scheduler Slave <Masters> Linux Scheduler Master (Google WorkQueue) Tablet Server N Lock Service (Chubby, OSDI’06) BigTable Master Scheduler Slave • Master for load balancing and fault tolerance • Metadata: Use Chubby to monitor health of tablet servers, restart failed servers [OSDI’06] • Data: GFS replicates data. [SOSP’03] GFS Chunk Server GFS Master Linux
Finding a Tablet • In most cases, clients directly communicate with the Tablet server
Editing a Table • Mutations are logged, then applied to an in-memory version • Logfile stored in GFS
Table • Multiple tablets make up the table • SSTables can be shared • Tablets do not overlap, SSTables can overlap Tablet Tablet apple boat aardvark apple_two_E SSTable SSTable SSTable SSTable
Discussion Points • What’s the difference between these two and NFS/DBMS in terms of interface? • Non-hierarchical name spaceKey vs. <Row,Col,Timestamp> • Dynamo vs. Bigtable? • Partitioning: Hashing without master vs. Alphabet ordered key with master • Consistency: Quorum/Versioning vs. Chubby • Fault tolerance: Replication vs. GFS • Load Balancing: Virtual Node vs. Tablet