410 likes | 817 Views
Time Series Data in MongoDB. Massimo Brignoli. #mongodb. Senior Solutions Architect , MongoDB Inc. Agenda. What is time series data? Schema design considerations Broader use case: operational intelligence MMS Monitoring schema design Thinking ahead Questions. What is time series data?.
E N D
Time Series Data in MongoDB Massimo Brignoli #mongodb Senior Solutions Architect, MongoDB Inc.
Agenda • What is time series data? • Schema design considerations • Broader use case: operational intelligence • MMS Monitoring schema design • Thinking ahead • Questions
Time Series Data is Everywhere • Financial markets pricing (stock ticks) • Sensors (temperature, pressure, proximity) • Industrial fleets (location, velocity, operational) • Social networks (status updates) • Mobile devices (calls, texts) • Systems (server logs, application logs)
Time Series Data at a Higher Level • Widely applicable data model • Applies to several different “data use cases” • Various schema and modeling options • Application requirements drive schema design
Time Series Data Considerations • Resolution of raw events • Resolution needed to support • Applications • Analysis • Reporting • Data retention policies • Data ages out • Retention
Designing For Writing and Reading • Document per event • Document per minute (average) • Document per minute (second) • Document per hour
Document Per Event { server: “server1”, load: 92, ts: ISODate("2013-10-16T22:07:38.000-0500") } • Relational-centric approach • Insert-driven workload • Aggregations computed at application-level
Document Per Minute (Average) { server: “server1”, load_num: 92, load_sum: 4500, ts: ISODate("2013-10-16T22:07:00.000-0500") } • Pre-aggregate to compute average per minute more easily • Update-driven workload • Resolution at the minute-level
Document Per Minute (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 58: 45, 59: 40 } ts: ISODate("2013-10-16T22:07:00.000-0500") } • Store per-second data at the minute level • Update-driven workload • Pre-allocate structure to avoid document moves
Document Per Hour (By Second) { server: “server1”, load: { 0: 15, 1: 20, …, 3598: 45, 3599: 40 } ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 3599 steps
Document Per Hour (By Second) { server: “server1”, load: { 0: {0: 15, …, 59: 45}, …. 59: {0: 25, …, 59: 75} ts: ISODate("2013-10-16T22:00:00.000-0500") } • Store per-second data at the hourly level with nesting • Update-driven workload • Pre-allocate structure to avoid document moves • Updating last second requires 59+59 steps
Characterzing Write Differences • Example: data generated every second • Capturing data per minute requires: • Document per event: 60 writes • Document per minute: 1 write, 59 updates • Transition from insert driven to update driven • Individual writes are smaller • Performance and concurrency benefits
Characterizing Read Differences • Example: data generated every second • Reading data for a single hour requires: • Document per event: 3600 reads • Document per minute: 60 reads • Read performance is greatly improved • Optimal with tuned block sizes and read ahead • Fewer disk seeks
MMS Monitoring • MongoDB Management System Monitoring • Available in two flavors • Free cloud-hosted monitoring • On-premise with MongoDB Enterprise • Monitor single node, replica set, or sharded cluster deployments • Metric dashboards and custom alert triggers
MMS Application Requirements Resolution defines granularity of stored data Range controls the retention policy, e.g. after 24 hours only 5-minute resolution Display dictates the stored pre-aggregations, e.g. total and count
Monitoring Schema Design { timestamp_minute: ISODate(“2013-10-10T23:06:00.000Z”), num_samples: 58, total_samples: 108000000, type: “memory_used”, values: { 0: 999999, … 59: 1800000 } } • Per-minute document model • Documents store individual metrics and counts • Supports “total” and “avg/sec” display
Monitoring Data Updates db.metrics.update( { timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"), type: “memory_used” }, { {$set: {“values.59”: 2000000 }}, {$inc: {num_samples: 1, total_samples: 2000000 }} } ) • Single update required to add new data and increment associated counts
Monitoring Data Management • Data stored at different granularity levels for read performance • Collections are organized into specific intervals • Retention is managed by simply dropping collections as they age out • Document structure is pre-created to maximize write performance
What is Operational Intelligence • Storing log data • Capturing application and/or server generated events • Hierarchical aggregation • Rolling approach to generate rollups • e.g. hourly > daily > weekly > monthly • Pre-aggregated reports • Processing data to generate reporting from raw events
Storing Log Data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 "[http://www.example.com/start.html](http://www.example.com/start.html)" "Mozilla/4.08 [en] (Win98; I ;Nav)” { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", user: 'frank', time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", request: "GET /apache_pb.gif HTTP/1.0", status: 200, response_size: 2326, referrer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)" }
Pre-Aggregation • Analytics across raw events can involve many reads • Alternative schemas can improve read and write performance • Data can be organized into more coarse buckets • Transition from insert-driven to update-driven workloads
Pre-Aggregated Log Data { timestamp_minute: ISODate("2000-10-10T20:55:00Z"), resource: "/index.html", page_views: { 0: 50, … 59: 250 } } • Leverage time-series style bucketing • Track individual metrics (ex. page views) • Improve performance for reads/writes • Minimal processing overhead
Hierarchical Aggregation • Analytical approach as opposed to schema approach • Leverage built-in Aggregation Framework or MapReduce • Execute multiple tasks sequentially to aggregate at varying levels • Raw events Hourly Weekly Monthly • Rolling approach distributes the aggregation workload
Before You Start • What are the application requirements? • Is pre-aggregation useful for your application? • What are your retention and age-out policies? • What are the gotchas? • Pre-create document structure to avoid fragmentation and performance problems • Organize your data for growth – time series data grows fast!
Down The Road • Scale-out considerations • Vertical vs. horizontal (with sharding) • Understanding the data • Aggregation • Analytics • Reporting • Deeper data analysis • Patterns • Predictions
Scaling Time Series Data in MongoDB • Vertical growth • Larger instances with more CPU and memory • Increased storage capacity • Horizontal growth • Partitioning data across many machines • Dividing and distributing the workload
Time Series Sharding Considerations • What are the application requirements? • Primarily collecting data • Primarily reporting data • Both • Map those back to • Write performance needs • Read/write query distribution • Collection organization (see MMS Monitoring) • Example: {metric name, coarse timestamp}
Aggregates, Analytics, Reporting • Aggregation Framework can be used for analysis • Does it work with the chosen schema design? • What sorts of aggregations are needed? • Reporting can be done on predictable, rolling basis • See “Hierarchical Aggregation” • Consider secondary reads for analytical operations • Minimize load on production primaries
Deeper Data Analysis • Leverage MongoDB-Hadoop connector • Bi-directional support for reading/writing • Works with online and offline data (e.g. backup files) • Compute using MapReduce • Patterns • Recommendations • Etc. • Explore data • Pig • Hive
Resources • Schema Design for Time Series Data in MongoDBhttp://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb • Operational Intelligence Use Casehttp://docs.mongodb.org/ecosystem/use-cases/#operational-intelligence • Data Modeling in MongoDBhttp://docs.mongodb.org/manual/data-modeling/ • Schema Design (webinar)http://www.mongodb.com/events/webinar/schema-design-oct2013