1 / 30

“One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker

“One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker. Current DBMS Gold Standard. Store fields in one record contiguously on disk Use B-tree indexing Use small (e.g. 4K) disk blocks Align fields on byte or word boundaries

sancho
Download Presentation

“One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “One Size Fits All”An Idea Whose Time Has Come and GonebyMichael Stonebraker

  2. Current DBMS Gold Standard • Store fields in one record contiguously on disk • Use B-tree indexing • Use small (e.g. 4K) disk blocks • Align fields on byte or word boundaries • Conventional (row-oriented) query optimizer and executor

  3. Terminology -- “Row Store” Record 1 Record 2 Record 3 Record 4 E.g. DB2, Oracle, Sybase, SQLServer, …

  4. Row Stores are Write Optimized • Can insert and delete a record in one physical write • Good for OLTP • But not for the data warehouse and other read-mostly markets

  5. The Elephants and Warehouses • Bitmap indexes • Star schema optimization • Materialized views • Compression (coding) or attributes • But there is a better idea……

  6. A Column Store (Like Sybase IQ)

  7. Among the Ideas…… • Only read the attributes you need • Coding is more effective • No alignment • Big data blocks Huge win on stuff like TPC-H!! Stream Processing is another example

  8. Example Application – Feed Alarms Custom-coded Feed alarm application Feed A alarms Feed B

  9. Characteristics of Feed Alarm Pilot • 500 rapidly updating tickers (5 sec. interval) + 4000 slowly updating tickers (60 sec. interval) in each FEED. • Problem Types • Low-level alarm  Ticker not seen within update interval. • Problem in Feed  More than 100 low-alarms from Feed A or Feed B • Problem in Exchange  More than 100 low-level alarms from NASDAQ or NYSE • Suppression: • When problems of type 2 or 3 detected, do not emit (distracting) problems of type 1.

  10. Results • StreamBase implementation: • ~ 150K msgs/sec on a 3.2GHz Linux pentium • Elephant solution • ~900 msgs/sec on the same hardware More than 2 orders of magnitude difference……

  11. Why? • Inbound vs outbound processing • The right primitives • Integration of application logic

  12. Traditional ModelOutbound Processing Processing And queries Data Updates Storage

  13. Stream Processing ModelInbound Processing Application Data Storage

  14. 500 fast Alarm D=5 sec Filter Alarm=1 Map U Hi = Prob in Reuters Reuters Count 100 Filter hi/lo Lo = Problem in Security In Reuters 4000 slow Alarm D=60 sec Filter Alarm=1 Hi = Prob in NY NY Filter ex =NY U Count 100 Filter hi=1 NA Hi = Prob in NA Filter ex = NY NY U Count 100 Filter hi=1 NA 500 fast Alarm D=5 sec Filter Alarm=1 Map U Hi = Prob in Comstock ComStock Count 100 Filter hi/lo Lo = Prob in Security in Comstock 4000 slow Alarm D=60 sec Filter Alarm=1 Alarm Correlation Application

  15. Inbound Processing • Never store the data! • Lower overhead • Lower latency

  16. Inbound Processing in DBMSs • Triggers (glue-on) • Limited support • Often slow In theory, a DBMS could be both inbound and outbound, but this is a research project…. Hooking a query plan up to a stream is a start…..

  17. Windowed Time Series Operators • Windowed time series operators • Group by stock_id • Window is 2 ticks • Slide by 1 tick • Resilient to stream imperfections • User-specified timeouts for late data

  18. 500 fast Alarm D=5 sec Filter Alarm=1 Map U Hi = Prob in Reuters Reuters Count 100 Filter hi/lo Lo = Problem in Security In Reuters 4000 slow Alarm D=60 sec Filter Alarm=1 Hi = Prob in NY NY Filter ex =NY U Count 100 Filter hi=1 NA Hi = Prob in NA Filter ex = NY NY U Count 100 Filter hi=1 NA 500 fast Alarm D=5 sec Filter Alarm=1 Map U Hi = Prob in Comstock ComStock Count 100 Filter hi/lo Lo = Prob in Security in Comstock 4000 slow Alarm D=60 sec Filter Alarm=1 Alarm Correlation Application

  19. Windowed Aggregates with Timeout in DBMSs • In the trigger system? • On stored data (polling)?

  20. Integration of Application Logic • All required capabilities in single system • No process switches • Integrated storage (not client-server)

  21. Integrated Code • Map • F.evaluate: • cnt++ • if (cnt % 100 != 0) if !suppress emit lo-alarm • else emit drop-alarm • else emit hi-alarm, set suppress = true Count100 same as • Lets first 100 low-alarms through. • Emits one high-alarm for every 100 low-alarms. • Suppresses low-alarms after 1sthigh-alarm.

  22. Application Integration in DBMSs • Client-server present for protection • Stored procedures are a start • tough to do control flow • Object-relational blades are better • But still tough to do control flow • Unified programming language never made it • E.g. Rigel or Pascal R • No support for embedded DBMS applications

  23. Transactions in Streams • Locking • Critical sections are enough; no need for xacts • Crash recovery • Log-based recovery slow • doesn’t recover whole state • System unavailable during recovery • Much better to just do HA • Failover to a backup (Tandem-style) • Forget about state recovery

  24. Net-Net • Inbound vs outbound processing • Windowed primitives vs end-of-table primitives • Separate app vs embedded app • HA failover vs transactions

  25. Whenever These Matter a Lot • Separate engine • To get 2 orders of magnitude benefit

  26. Candidates for a Separate Engine • OLTP • Warehouses • Stream processing • Sensor networks (TinyDB, etc.) • Text retrieval (Google, etc.) • Scientific data bases (lineage, arrays, etc.)

  27. Obvious Research Template • Pick an area where “one size doesn’t fit” • And figure out what does

  28. More Generally • Current system software factored into • App server (e.g. Websphere) • Messaging system (e.g, MQSeries) • DBMS (e.g. DB2) • Stream processing engines integrate pieces of all three • To avoid process switches • How many other interesting factorings are there?

  29. High Level Stream Processing Bit • StreamSQL? • Rule engine?

  30. Interesting Stream Issues • Morph from history to real time seamlessly • Replay • On the fly • Stream imperfections • Late • Missing • Out-of-order • Causality • Tick (symbol, volume, price, time) • Splits (symbol, factor) Produce the split-adjusted price!

More Related