Modeling Relational Data in a Denormalized World

Where’s my lookup table? Modelingrelational data in a denormalized world. Rick Houlihan Principal Technologist, NoSQL - AWS

What are we talking about? A Brief History of Data Processing – Why NoSQL? The Great Migration – Amazon’s path to database freedom Making it Work - Normalized versus denormalized data modeling Document, Wide Column, Key-Value –Why it’s all the same Why Cloud Native NoSQL is the best choice

History of Data Processing “History repeats itself because nobody was listening the first time” - Anonymous

Timeline of Database Technology

Technology Adoption and the Hype Curve

Amazon’s Great Migration “We all know about certain problems we wish we didn't have.” – Dean Kamen

20 years of code 3,000 Oracle Instances Over 12,000 services ~25,000 developers 54.5MM TPS Peak (Prime Day 2019)

Why make the move? RDBMS was breaking at scale • Required rollups and denormalized data to avoid complex queries Impact of bad deployments • One bad stored procedure would break a dozen services NoSQL databases are “flexible” • Schemaless design allows the app to evolve over time Cost of RDBMS was too high • CPU is expensive, storage is cheap

Sizing the Workloads • Problems with limited scope are easier to solve • “I need a system to track employee vacation time.” • Unbounded problems are harder to solve • “I need a root cause analysis engine to correlate transaction level events to buying patterns across global markets.”

Sizing the Database

Scaling Relational DBs

Sharded Relational DBs? ? C D A B

Denormalize and shard to provide horizontal scale Near unbounded throughput and storage NoSQL Databases Collection 1 1 TB Shard B Shard A 500 GB 500 GB

Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale Partition Keys in NoSQL Id = 1 Name = Jim Id = 2 Name = Andy Dept = Eng Id = 3 Name = Kim Dept = Ops Hash (1) = 7B Hash (2) = 48 Hash (3) = CD 00 54 55 A9 AA FF Key Space 00 FF

Online Transaction Processing (OLTP) Most common type of app Online Analytics Processing (OLAP) BI and ad-hoc data projections Decision Support Systems (DSS) Long running query aggregations and projections Types of Database Workloads Operations Analytics

Why NoSQL? SQL NoSQL

NoSQLData Modeling “A ship in port is safe, but that’s not what ships were built for.” - Grace Hopper

It’s all about relationships… Social network Document management Process control IT monitoring Data trees

The NoSQL Model • Table/Collection • Items • Attributes • PartitionKey • Sort Key All items for key ==, <, >, >=, <= “begins with” “between” sorted results counts top/bottom N values Mandatory Key-value access pattern Determines data distribution Optional Model 1:N and N:N relationships Enables rich query capabilities

SQL vs. NoSQL design pattern SELECT * FROM Products INNER JOIN Books SELECT * FROM PRODUCTS SELECT * FROM Products INNER JOIN Albums INNER JOIN Tracks SELECT * FROM Products INNER JOIN Videos INNER JOIN ActorVideo INNER JOIN Actors

Ad hoc “Joins” in SQL SELECT * FROM PRODUCTS INNER JOIN BOOKS ON productId = productId WHERE name = “Book Title” SELECT * FROM PRODUCTS INNER JOIN ALBUMS ON productId = productId INNER JOIN TRACKS ON albumId= albumId WHERE name = “Album Title” SELECT * FROM PRODUCTS INNER JOIN VIDEOS ON productId = productId INNER JOIN ACTORVIDEO ON videoId = videoId INNER JOIN ACTORS ON actorId = actorId WHERE name = “Movie Title”

Modeled “Joins” in NoSQL SELECT * WHERE PK=“Book Title” SELECT * WHERE PK=“Album Title” SELECT * WHERE PK=“Movie Title”

Modeled “Joins” in NoSQL Swap PK and SK on index SELECT * WHERE PK=“Author Name” SELECT * WHERE PK=“Song Title” SELECT * WHERE PK=“Actor Name” SELECT * WHERE PK=“Director Name” SELECT * WHERE PK=“Musician”

Document vs. Wide Column Data Modeling { Default “_id” Index supports K/V access patterns, e.g “Get employee data by email”, etc. _id: “jsmith@abc.com”, firstName: “John”, lastName: “Smith”, address: “123 A Street”, city: “Seattle”, state: “WA”, Secondary index on “desk” supports sub-tree aggregations, e.g. StartsWith(“SEA58”) desk: “SEA58-07.650.O1” }

Document vs. Wide Column { _id: “jsmith@abc.com”, firstName: “John”, lastName: “Smith”, address: “123 A Street”, city: “Seattle”, state: “WA”, desk: “SEA58-07.650.O1” }

Indexing Efficiently in NoSQL

NoSQL Data Modeling - KEY CONCEPTS • Selecting a Partition/Shard Key • Large number of distinct values • Items are uniformly requestedand randomly distributed • Creating Compound Indexes • Model 1:n and n:nrelationships • Efficient/selective patterns • Query multiple entities • Leverage range queries • Examples: • Bad: Status, Gender • Good: CustomerId, DeviceId • Examples: • Orders.OrderItem • Hierarchical relationships

TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review

TENETS OF NoSQL DATA MODELING • Nature of the application • OLTP / OLAP / DSS • Define the Entity-Relationship Model • Identify Data Life Cycle • TTL, Backup/Archival, etc. • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review

TENETS OF NoSQL DATA MODELING • Identify data sources • Define queries and write patterns • Document all workflows • Understand the use case • Define the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review

TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat -> Review 1 application service = 1 table • Reduce round trips • Simplify access patterns Identify Primary Keys • How will items be inserted and read? • Overload items into partitions Define indexes for secondary access patterns

TENETS OF NoSQL DATA MODELING • Understand the use case • Identify the access patterns • Read/Write workloads • Query dimensions and aggregations • Data-modeling • Avoid relational design patterns, use one table • Review -> Repeat-> Review

Complex Queries “Computers are useless. They can only give you answers.” - Pablo Picasso

DynamoDB Streams and AWS Lambda

Triggers Item/table level metrics • Amazon CloudSearch Notify change Kinesis Firehose Lambda function

Advanced Data Modeling “It turns out it's important to build a product and not just a bunch of data models.” - Hilary Mason

Managing Relational Transactions • Configuration Management Service • Resolver Groups • Contacts • Configuration Items • Transactional Workflows • Add Config Items to Resolver Groups • Update Config Item status • Add Contacts to Resolver Groups

DynamoDB Transactions API TransactWriteItems • Synchronous update, put, delete, and check • Atomic • Automated Rollbacks • Up to 10 items within a transaction • Supports multiple tables • Complex conditional checks Good Use Cases • Commit changes across items • Conditional batch inserts/updates • Bad Use Case • Maintaining normalized data

DynamoDB Table Schema De-normalized Contacts Contact Partitions Configuration Items Resolver Partitions Resolver Metadata

Reverse Lookup GSI

Complex Relational Data “Dude, where’s my lookup table?” - Anonymous Amazon SDE

Modeling Complex Relationships

The Table Schema

The Index Schema (GSI1)

The Final Result

Optimize for Common Patterns “If you optimize everything, you will always be unhappy.” - Donald Knuth

Access Patterns Matter • Insurance quote service • Store all quote versions • 200+ attributes per quote • 50KB Average record size • 800 quotes per minute peak • 1K WCU provisioned

Optimized for Writes • Version category items as they are updated • Send all versions when queried • Process with client side logic • 20 WCU Provisioned

Modeling Relational Data in a Denormalized World

Modeling Relational Data in a Denormalized World

Presentation Transcript