1 / 24

Add Real-time Streaming SQL to Your MySQL Skill Set

Add Real-time Streaming SQL to Your MySQL Skill Set. Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services. The Data Crunch. Data volumes rising fast Human-originated data (e.g. e-commerce purchases) rising fast

imala
Download Presentation

Add Real-time Streaming SQL to Your MySQL Skill Set

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Add Real-time Streaming SQL to Your MySQL Skill Set Julian Hyde - Chief Architect Steve Herskovitz – Director of Professional Services

  2. The Data Crunch Data volumes rising fast Human-originated data (e.g. e-commerce purchases) rising fast Machine-generated data (e.g. e-commerce events and network packets) rising even faster Every business needs answers with lower latency Every significant problem is distributed: Geographically distributed organizations Multiple boxes for scale Exploit multiple cores

  3. Data management is hard If you make a mistake, the system won’t be fast enough Can’t afford to lose data New technologies are very difficult to use MapReduce NoSQL Multi-threaded programming in Java, C++, Erlang, Scala, … Collaborate, interoperate, evolve

  4. Today’s Computing Model Transaction Processing Application Business Transactions Database $$$$ Real-Time Application Polling Batch Load Process Infrastructure Application Log file Business Events 4

  5. Stream Computing Model Database $$ Traditional BI Application Business Transactions Transaction Processing Application Stream Processor Real-Time Application Real-Time Answers Infrastructure Application Business Events 5

  6. Case study: Mozilla

  7. Demo: Mozilla downloads

  8. SQL – life in the old dinosaur yet Widely spoken Rich Orthogonal Declarative Tune your system without changing your logical schema Apps don’t interfere with each other Adaptive Route around failure Exploit available resources Make tradeoffs to meet QoS goals

  9. Streaming SQL: example #1 Tweets about this conference: • SELECT STREAM ROWTIME, author, text FROM TweetsWHERE text LIKE ‘%#MySQL%'

  10. Demo: studio & simple query

  11. Streaming SQL basics • Streams: • CREATE STREAM Tweets ( author VARCHAR(20), text VARCHAR(140)); • Relational operators have streaming counterparts: • Project (SELECT) • Filter (WHERE) • Union • Join • Aggregation (GROUP BY) • Windowed aggregation (e.g. SUM(x) OVER window) • Sort (ORDER BY)

  12. Streaming SQL: example #2 • Each minute, return the number of clicks on each web page: • SELECT STREAM ROWTIME, uri, COUNT(*)FROM PageRequestsGROUP BY FLOOR(ROWTIME TO MINUTE), uri

  13. Streaming SQL: Time • ROWTIME pseudo-column • Provided by source application or generated by system • WINDOW • Present in regular SQL (e.g. SQL:2003) but more important in streaming SQL • Defines a ‘working set’ for streaming JOIN, GROUP BY, windowed aggregation • Monotonicity (“sortedness”) • Prerequisite for certain streaming operations

  14. Streaming SQL: example #3 Find all orders from New York that shipped within an hour: • CREATE VIEW compliant_orders ASSELECT STREAM *FROM orders OVER sla JOIN shipments ON orders.id = shipments.orderidWHERE city = 'New York'WINDOW sla AS (RANGE INTERVAL '1' HOUR PRECEDING)

  15. Streaming SQL: other stuff • Schemas, views, tables • Ability to nest queries • User-defined functions and transforms • Adapters make external systems look like read/write streams

  16. Streaming SQL: example #4 Find all stock trades where the average price over the last ten trades is two standard deviations higher than the average over the last hour: • SELECT STREAM *FROM ( SELECT STREAM ticker, price, volume, AVG(price) OVER lastHour AS avgHr, STDDEV(price) OVER lastHour AS stddevHr, AVG(price) OVER lastTenTrades AS avg10 FROM Trades WINDOW lastTenTrades AS (PARTITION BY ticker ROWS 10 PRECEDING)), lastHour AS (PARTITION BY ticker RANGE INTERVAL ’1’ HOUR PRECEDING)WHERE avg10 > avgHr + 2 * stddevHr

  17. Streaming SQL for business intelligence Conventional BI: • Star schema: • Fact table • Dimension tables • Aggregate tables • Data warehouse populated using an ETL process • OLAP servers (e.g. Mondrian) provide a top-down view of data Challenge: • Keep all of these systems up to date in real time • Alert when key metrics are outside acceptable range

  18. ETL Process for OLAP OLAP OLAP cache flushed after load Conventional ETL Operationaldatabase Datawarehouse Aggregate tables populated from DW

  19. OLAP imperatives Highly aggregated data – e.g. one number computed from 10M rows Therefore: Use a cache Materialize results as aggregates Real-time imperatives View latest version of the data Maintaining N aggregates requires ~N blocks of I/O per incoming row Therefore: Don’t use a cache Don’t maintain aggregates Real-time OLAP: Challenges

  20. Real-time OLAP: Solutions • Notify cache when underlying data has changed • Populate cache from data warehouse • Continuous ETL process • Build aggregates in memory • Flush to disk intermittently • OLAP engine looks for aggregates in memory first

  21. Continuous ETL for Real-time OLAP OLAP OLAP cache flushed proactively SQLstream Operationaldatabase Datawarehouse Aggregate tables populated incrementally

  22. Summary • Data problems are getting harder • People are trying – and failing – to solve these problems with SQL databases • Stream computing is a powerful new kind of platform • Streaming SQL is pragmatic and powerful

  23. Any questions?

  24. Thank you for attending! Further reading: • “Data in Flight” by Julian Hyde (Communications of the ACM, Vol. 53 No. 1, Pages 48-52) Blog: http://julianhyde.blogspot.com/ Twitter: @julianhyde

More Related