560 likes | 568 Views
Explore the transition from normalized to denormalized data modeling, benefits of NoSQL databases for scalability, and strategies for effective data modeling in a cloud-native environment. Learn key concepts, historical context, and practical insights from industry experts.
E N D
Where’s my lookup table? Modelingrelational data in a denormalized world. Rick Houlihan Principal Technologist, NoSQL - AWS
What are we talking about? A Brief History of Data Processing – Why NoSQL? The Great Migration – Amazon’s path to database freedom Making it Work - Normalized versus denormalized data modeling Document, Wide Column, Key-Value –Why it’s all the same Why Cloud Native NoSQL is the best choice
History of Data Processing “History repeats itself because nobody was listening the first time” - Anonymous
Amazon’s Great Migration “We all know about certain problems we wish we didn't have.” – Dean Kamen
20 years of code 3,000 Oracle Instances Over 12,000 services ~25,000 developers 54.5MM TPS Peak (Prime Day 2019)
Why make the move? RDBMS was breaking at scale • Required rollups and denormalized data to avoid complex queries Impact of bad deployments • One bad stored procedure would break a dozen services NoSQL databases are “flexible” • Schemaless design allows the app to evolve over time Cost of RDBMS was too high • CPU is expensive, storage is cheap
Sizing the Workloads • Problems with limited scope are easier to solve • “I need a system to track employee vacation time.” • Unbounded problems are harder to solve • “I need a root cause analysis engine to correlate transaction level events to buying patterns across global markets.”
Sharded Relational DBs? ? C D A B
Denormalize and shard to provide horizontal scale Near unbounded throughput and storage NoSQL Databases Collection 1 1 TB Shard B Shard A 500 GB 500 GB
Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale Partition Keys in NoSQL Id = 1 Name = Jim Id = 2 Name = Andy Dept = Eng Id = 3 Name = Kim Dept = Ops Hash (1) = 7B Hash (2) = 48 Hash (3) = CD 00 54 55 A9 AA FF Key Space 00 FF
Online Transaction Processing (OLTP) Most common type of app Online Analytics Processing (OLAP) BI and ad-hoc data projections Decision Support Systems (DSS) Long running query aggregations and projections Types of Database Workloads Operations Analytics
Why NoSQL? SQL NoSQL
NoSQLData Modeling “A ship in port is safe, but that’s not what ships were built for.” - Grace Hopper
It’s all about relationships… Social network Document management Process control IT monitoring Data trees
The NoSQL Model • Table/Collection • Items • Attributes • PartitionKey • Sort Key All items for key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values Mandatory Key-value access pattern Determines data distribution Optional Model 1:N and N:N relationships Enables rich query capabilities
SQL vs. NoSQL design pattern SELECT * FROM Products INNER JOIN Books SELECT * FROM PRODUCTS SELECT * FROM Products INNER JOIN Albums INNER JOIN Tracks SELECT * FROM Products INNER JOIN Videos INNER JOIN ActorVideo INNER JOIN Actors
Ad hoc “Joins” in SQL SELECT * FROM PRODUCTS INNER JOIN BOOKS ON productId = productId WHERE name = “Book Title” SELECT * FROM PRODUCTS INNER JOIN ALBUMS ON productId = productId INNER JOIN TRACKS ON albumId= albumId WHERE name = “Album Title” SELECT * FROM PRODUCTS INNER JOIN VIDEOS ON productId = productId INNER JOIN ACTORVIDEO ON videoId = videoId INNER JOIN ACTORS ON actorId = actorId WHERE name = “Movie Title”
Modeled “Joins” in NoSQL SELECT * WHERE PK=“Book Title” SELECT * WHERE PK=“Album Title” SELECT * WHERE PK=“Movie Title”
Modeled “Joins” in NoSQL Swap PK and SK on index SELECT * WHERE PK=“Author Name” SELECT * WHERE PK=“Song Title” SELECT * WHERE PK=“Actor Name” SELECT * WHERE PK=“Director Name” SELECT * WHERE PK=“Musician”
Document vs. Wide Column Data Modeling { Default “_id” Index supports K/V access patterns, e.g “Get employee data by email”, etc. _id: “jsmith@abc.com”, firstName: “John”, lastName: “Smith”, address: “123 A Street”, city: “Seattle”, state: “WA”, Secondary index on “desk” supports sub-tree aggregations, e.g. StartsWith(“SEA58”) desk: “SEA58-07.650.O1” }
Document vs. Wide Column { _id: “jsmith@abc.com”, firstName: “John”, lastName: “Smith”, address: “123 A Street”, city: “Seattle”, state: “WA”, desk: “SEA58-07.650.O1” }
NoSQL Data Modeling - KEY CONCEPTS • Selecting a Partition/Shard Key • Large number of distinct values • Items are uniformly requestedand randomly distributed • Creating Compound Indexes • Model 1:n and n:nrelationships • Efficient/selective patterns • Query multiple entities • Leverage range queries • Examples: • Bad: Status, Gender • Good: CustomerId, DeviceId • Examples: • Orders.OrderItem • Hierarchical relationships
TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review
TENETS OF NoSQL DATA MODELING • Nature of the application • OLTP / OLAP / DSS • Define the Entity-Relationship Model • Identify Data Life Cycle • TTL, Backup/Archival, etc. • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review
TENETS OF NoSQL DATA MODELING • Identify data sources • Define queries and write patterns • Document all workflows • Understand the use case • Define the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review
TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review 1 application service = 1 table • Reduce round trips • Simplify access patterns Identify Primary Keys • How will items be inserted and read? • Overload items into partitions Define indexes for secondary access patterns
TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat-> Review
Complex Queries “Computers are useless. They can only give you answers.” - Pablo Picasso
Triggers Item/table level metrics • Amazon CloudSearch Notify change Kinesis Firehose Lambda function
Advanced Data Modeling “It turns out it's important to build a product and not just a bunch of data models.” - Hilary Mason
Managing Relational Transactions • Configuration Management Service • Resolver Groups • Contacts • Configuration Items • Transactional Workflows • Add Config Items to Resolver Groups • Update Config Item status • Add Contacts to Resolver Groups
DynamoDB Transactions API TransactWriteItems • Synchronous update, put, delete, and check • Atomic • Automated Rollbacks • Up to 10 items within a transaction • Supports multiple tables • Complex conditional checks Good Use Cases • Commit changes across items • Conditional batch inserts/updates • Bad Use Case • Maintaining normalized data
DynamoDB Table Schema De-normalized Contacts Contact Partitions Configuration Items Resolver Partitions Resolver Metadata
Complex Relational Data “Dude, where’s my lookup table?” - Anonymous Amazon SDE
Optimize for Common Patterns “If you optimize everything, you will always be unhappy.” - Donald Knuth
Access Patterns Matter • Insurance quote service • Store all quote versions • 200+ attributes per quote • 50KB Average record size • 800 quotes per minute peak • 1K WCU provisioned
Optimized for Writes • Version category items as they are updated • Send all versions when queried • Process with client side logic • 20 WCU Provisioned