460 likes | 473 Views
Learn about the modernization journey, storage characteristics, and use cases of Cosmos DB in the session by Anil Nori, a Distinguished Engineer at Microsoft. Explore the vision of Universal Store Team (UST) and the transition from legacy to modern cloud services, uncovering the benefits and challenges along the way.
E N D
Building Microsoft first-party Modern Cloud Services with Azure Cosmos DB Anil Nori Distinguished Engineer Microsoft BRK3158
In this session… UST Introduction Modernization journey Modern Services Storage characteristics Cosmos DB Fit Cosmos DB Use Cases Cosmos DB under the hood Storage Guidance Q&A
Universal Store Team (UST) Vision One Store, Many Storefronts One Store, Many Storefronts A Universal Store for all Microsoft commerce encompassing everything we sell and everything others sell through us — consumer and commercial, digital and physical, subscription and transaction — via all channels and storefronts. A Universal Store for all Microsoft commerce encompassing everything we sell and everything others sell through us — consumer and commercial, digital and physical, subscription and transaction — via all channels and storefronts.
One store and many storefronts Active in 240 markets 100s of TB of data ~1,000services
Modern UST Services Modern UST Services delivering data and business logic, at extreme scale, at all times, on demand Security Compliance Global Reach Microservices REST APIs Extreme Scale Always Available Feature Agility Service Agility Engineering and Operational Agility Real-Time Visibility Business Insights and Reporting
Why modernize? • Built and evolved over a decade with too much legacy • Stack was never re-architected • Too many silos • Hard to enhance • Not designed for Cloud scale • Scale-up based • SQL-centric (more later) • Lack of robust HA and DR • Lack of predictable SLAs • No COGS consideration • No Service Orientation • Not decoupled, too many dependencies • Not real-time visibility into overall system • Hard to operate a service • Too many systems with too much of custom and duplicate infrastructure • Queues, Alerting mechanisms, replication mechanisms, Engineering systems, etc.
Storage Issues • On-premises storage infrastructure • Legacy commerce platforms started as a Single DB solution • Worked well when the data and transaction volumes are low • Very client-server centric • “Relational”-centric data models • No separation of compute (logic) and storage – most business logic in stored procedures • Too much locking and serializable isolation • By default strong consistency is assumed • Evolved into multiple DBs (instances) as the volumes increased • Distributed transactions across DBs • Scale-up instead of scale-out • Too many HA/DR mechanisms used • Mirroring, Replication, Log Shipping – all of them used • Expensive infrastructure • High-end machines, SANs, Fusion IO cards, • Same DB is used for real-time, batch, backend, and management operations
Data model issues • Over Normalization causes too much database I/O • Consider Purchase Order • Multiple tables: Order, Order Lines, Billing Address, Shipping Address, Payment info, etc. • Singe purchase order access requires multiple table writes and queries (complex joins) • Increased Read/Write cost • Referential Integrity in DB makes it hard to partition and distribute data • Data cannot be partitioned across multiple services/DBs • Most DBMSs do not support referential integrity across DBs • Requires distributed transactions • Strict Consistency makes cross-geo/region high availability hard • Requires synchronous replication across geos/regions • Write failures at a region cause total failure • One Size Does Not fit All • No Read (query) – Write separation. Same DB for Heavy writes and complex queries. • Conflicting data model objectives • Unpredictable SLAs
Modern Service Design Principles • Micro-services based architecture • Unit of functionality and deployment, management, SLA boundary • Well-define application contracts (APIs and resources) • Decoupled services • Service types: real-time, batch, query, management • REST APIs • Resource based design • Scale-out based architecture • Commodity HW/SW • Stateless business logic • Distributed storage – SQL and NoSQL Storage • Predictable latency • Weak/Eventual Consistency • Availability over consistency • Global availability • Multi-master Active/active • Rich semantics (relationships, referential integrity, etc.) in business logic
Real time transaction services • API invoked • Object-at-a-time operations • Key-based lookup • Object read, Object write • Operates on small amounts of data • Extreme Scale, Predictable and low latency • Generally partitioned by user or transaction ID • Real-time, X-DC, Strict/Eventual, HA, Active/Active, App-initiated DR • E.g. Order, Subscription, Billing, Payment, Entitlement, Customer • Generally data access is within the service
Real time transaction services Globally Distributed Database Service Azure Cosmos DB
Azure Cosmos DB A globally distributed, massively scalable, multi-model database service SQL MongoDB Table API Graph Document Column-family Key-value Guaranteed low latency at the 99th percentile Elastic scale out of storage & throughput Five well-defined consistency models Turnkey global distribution Comprehensive SLAs
Cosmos DB fit • No SQL store, globally distributed database service, with predictable SLAs • Elastic Scale • Size and throughput • Transparent partitioning • Dynamic addition/removal of partitions • Global distribution • For failover and for low latency local access • Distributed across regions for scale and availability • Single write region and multiple read regions • Replication across regions – synchronous and async • Varying Consistency Levels • Tunable consistency per request (lower consistency level from account level setting) • Strong within region through quorum writes and reads • Strong consistency across regions through sync replication. • Eventual consistency across regions through async replication. • High Availability • Other features • TTLs for auto-purging of documents • Multiple API models – SQL, Mongo DB, Azure Table. . .
Predictable SLAs • JSON document – flexible schema • Primary key on document ID • Efficient support for path (secondary) indexing • Fast, auto index creation • Configurable • Predictable latency • < 10ms for read @P99 • < 10ms for write @P99 (including indexing, multiple copies within region) • Predictable throughput • Reserved/provisioned • Cosmos DB throttles application when access throughput for a partition is above the reserved throughput for the partition • Guaranteed availability – 5 9’s
Scale • Cosmos DB document collection can span one or more partitions (known as logical partitions). Logical partitions are defined by partition key (application defined) • All data belonging to a key is stored in a partition • Applications just declare collection size (S) and required collection throughput (T) • Partitioning is automatic and transparent • Number of physical partitions is determined by the size of the document collection and size of partition • Number of partitions (N)= Collection size (S)/Partition size (P) • Throughput (T) is uniformly distributed across N partitions, each partition reserving T/N RUs • Each partition can provide maximum throughput (t) = 10,000 RUs. If the total collection throughput (T) > N*t, Cosmos DB creates more partitions • For effective partitioning and throughput provisioning, • Choose selective partition key (large set of keys) • Data for each key is uniform • If data is very skewed, throughput distribution is not uniform, thereby requiring higher RU setting, increasing the overall cost • Larger partition size (in future) • Application managed partitions with separate document collections
Throughput • Cosmos DB guarantees application throughput by reserving required resource utilization. Instead of thinking about CPU, IO, and memory and how they each impact your application throughput, a Request Unit (RU) is the measure of throughput in Azure Cosmos DB. • 1 RU corresponds to the throughput of the GET of a 1 KB document. • Every operation in Azure Cosmos DB, including reads, writes, queries, and stored procedure executions has a deterministic Request Unit value based on the throughput required to complete the operation. • Throughput is provisioned/reserved as # of RUs/sec. • Cosmos DB billing is not consumption based; the application is billed whether or not the reserved throughput is consumed. • Understanding and managing RUs is important for managing storage cost. • Factors that impact RUs are: • Document Size – RUs in increments of KBs. For example, reading 2KB document costs twice as 1KB document read • Write cost is more than read; read-write is more expensive than write • Index creation – large number of indexes increase write cost • Consistency level – RU/sec costs increase from eventual to strict consistency
Azure Cosmos DB Demo: Capabilities Andrew Liu Product Management @ Azure Cosmos DB
Replication and High availability • Automatic and transparent replication worldwide • Each partition hosts a replica set per region • All regions are hidden behind a single global URI with multi-homing capabilities • Customers can dynamically add / remove • Automatic and Application Controlled failovers • Writes are constrained until failover is completed – immediate in case of manual failover • Always read availability from read regions/replicas. Multi-Master Active – Active configuration for write availability. • Active – Active (more later)
Consistency • Cosmos DB enables availability and geo scale with relaxed consistency (CAP). • If strong consistency is required always, in case of region failures, availability can not be guaranteed. • 5 consistency levels: • Strong: Reads are guaranteed to return the up-to-date version of a document • Cosmos DB supports strong consistency within and across geo regions, with synchronous replication. • Cannot guarantee low latency. Latency is bound by the longest/farthest region writes • Zero data loss on region failovers • Multi-master document collections for Active – Active (more later) • Bounded Staleness: Reads lag behind by k writes from up-to-date version; provide strong consistency with some data loss on failover • Session: Ordered committed reads, read-your-writes. Can provide strong consistency, for your application, but application has to manage propagating Cosmos DB session token (like a cookie), from and to Cosmos DB • Consistent Prefix, Eventual.
Weak/Eventual consistency • Transaction services write consistent entities • Most of the times, an Insert • Writes are never lost • Most services can tolerate weak/eventual consistency • Committed reads work • Weak/Eventual consistency enable de-coupling between services • No shared data across service boundaries • Data is published/replicated for sharing – asynchronously • Data publishing for backend processing (e.g. in billing) • Purchase history in account portals • Async replication for Active-Active availability/DR • No relationships spanning services • In commerce services, entities are seldom deleted – marked for delete. • Primary keys are immutable • No need for referential integrity maintenance – Reference can be lazily fixed • Handled in business logic • Enables de-coupling
Multi-master active-active • Write scalability around the world • Low latency writes around the world • 99.999% High Availability around the world • Comprehensive conflict management • (Example) Consider Payment Transactions • Payment transactions (i.e. credit card charge) must be always available. • Payment transactions are mostly inserts (writes) • A given write is routed to a write region • If the write region is down, it will be routed to another write region multi-master for writes. • Writes do not stop • Documents are merged when the failed region is online again. No data loss • No conflicts since transactions are inserts • 100% write availability
Azure Cosmos DB Demo: Active - Active Andrew Liu Product Management @ Azure Cosmos DB
Cosmos DB Change feedRead-Write Separation Real-time Transactional APIs Query Service New Order Order, Payments, etc. micro-services Read From Change Feed
Cosmos DB cost • Here are few ways applications can do to optimize the cost: • Select a partition key of higher cardinality to distribute the load across multiple partition to prevent hot partition. Provisioned capacity is uniformly distributed across all partitions • Cosmos DB plans to scale up underlying partitions, increasing the overall throughput and cost efficiency • Create Index on only the require fields (Default is all). Exclude fields /Path that are not searched • Use appropriate Consistency mode while accessing data • “Session” consistency is optimal but requires passing Cosmos DB session tokens on APIs • “Strong” costs more RUs then “Eventual” • Read from Secondary replicas if your application scenario allows it ( You are paying for it ; Might as well use it) • Compress the payload
Microsoft Rewards • Loyalty program • Reaches customers to drive engagement by motivating existing customers to do more in our ecosystem • Through incentives, recognition and premium experiences • Engagement w Store (app installs), Xbox (Game purchases, Mixer streams, Achievements, Groups…), Bing (queries), Edge (sessions) • Rewards Basics • Holds user profile, activities, points summary, etc. • Over 30 M active users • Hash of user ID used as the key • Document size varies from few Ks to 500K bytes • ~3.5K QPS read, ~2K QPS writes • Cosmos DB Configuration • 10-20 partitions/region, five regions, eventual consistency • Total of about 250GB, 100K RU • Eventual consistency
Contact Permission Master (CPM) • CPM Basics • Service for the whole company that provides customer contact preferences data • Holds user contactability permissions for email, phone, SMS and postal address • Migrated data primarily from legacy using SQL servers + Azure tables • Migrated >7B emails, >350M phones, and >100M postal addresses • Read heavy (primarily lookup by email address) • Very bursty traffic based on usage across many partners • Cosmos DB Configuration • 7 collections, largest one has 12TB data size • Small document size (1-2K bytes) • 2000 partitions, two regions • Session consistency • Email hash used as the lookup key • Verified that data and request load is reasonably uniformed • Peak of roughly 4K TPS, tested for 12K TPS with 500K RU configuration Outlook CPM West US Azure Cosmos DB AMC Geo-replicates to Azure Service Fabric XBOX Skype Central US Other 100+ partners ……..
Entitlement Service Modern clients • Primary use is to store entitlements, which are records of ownership for digital content such as Xbox titles, Office subscriptions, and Windows Store applications. • Extremely read heavy system services millions of clients around the world seeking licenses, or tokens, which allow digital content to run by querying the entitlements store for ownership. • Designed for 5 9’s SLA for the customer experience. • Migrating off discrete on-premises SQL. • Migrating 120B entitlements through bulk copy and shadowing to Cosmos DB • Caching sits above Cosmos DB for load reduction of simple lookups Collections Licensing Purchase Windows 8 Xbox 360 Legacy purchase/licensing Legacy purchase/licensing Entitlements GLS StorageFD
Entitlement Service Modern clients Collections Licensing Purchase • Cosmos DB Configuration • 3 collections. Each collection is partitioned (in application) across 4 write regions • Application partitions (partitioned by users) • Each write region is replicated in another region • 3200 partitions per collection, • Session ( strong) consistency • Document size usually <1K, 70TB documents • Estimate 10M RUs, 200K RPS, Read-heavy • Load not evenly distributed. Ordered by West US2, East US2, North Europe, East Asia. • Requests routed to the appropriate Cosmos DB collection using GLS (Global Location Service). Discussed later • Cosmos DB Passive(Backup) configuration • Exact mirror of all collection data in the Active configuration pushed via change feed. Windows 8 Xbox 360 Legacy purchase/licensing Legacy purchase/licensing Entitlements GLS StorageFD Application Partitions 3 2 3 4 1 2 1 4
The Global Lookup Service (GLS) provides static partitioning as a service to partners. Static partitioning is partitioning that does not change in response to resources becoming available or unavailable. Global Locator Service • 10+ Billion Keys • 120K rps at peak across all regions • Collection size of 9TB, ~1kb document size, 1500 partitions • 4 Regions – 1 write and 3 read regions • GLS maintains reliability of 99.99 over 5 min
Cosmos DB Under the hood • Log Structured Store • Writes are fast, written sequentially to log • Append only • In-memory index tables • Updates and deletes performed in memory, • Random reads are fast served out of memory, flushed to flash/SSD • Garbage collection/Compaction • Bw-Tree storage • Log structured store • No in-place updates. Delta writes • Minimizes CPU cache invalidations • Latch free, High concurrency • Operations (R/W) do not block • High utilization of CPU cores • Periodic checkpointing of document index • Document store • Document index: Bw-Tree • Documents in blobs • Compact and efficient secondary indexing • Bw-Tree • Bitmaps as postings (inverted) lists • Run length encoding • Resource governance from the ground up
Cosmos DB Best Practices • Uniform partition distribution • Small documents • “Session” consistency • Large partitions (in future) • Distribute reads • Use for key based access, simple list queries • If possible, spread transactions across regions with multiple document collection in active/active configuration (e.g. Payment Transaction service)
Summary • Cosmos DB is well-designed storage platform for modern cloud services • Cosmos DB is suitable for transactional services • High throughput • Scale • Low latency • Predictability • Availability with tunable consistency • Cosmos DB has enterprise scale, reliability, and maturity, used by many MSFT 1st party services • Cosmos DB is team is very agile and responsive
Please evaluate this sessionYour feedback is important to us! Please evaluate this session through MyEvaluations on the mobile appor website. Download the app:https://aka.ms/ignite.mobileApp Go to the website: https://myignite.techcommunity.microsoft.com/evaluations