1 / 13

Intel “Big Data” Science and Technology Center Michael Stonebraker

Intel “Big Data” Science and Technology Center Michael Stonebraker. Context. Intel held a national “beauty contest” to locate their next S & T center MIT won, with a “Big Data” proposal 160 proposals $2.5M per year for 3-5 years plus 5 Intel scientists 20 PIs, half at MIT.

madeline
Download Presentation

Intel “Big Data” Science and Technology Center Michael Stonebraker

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intel “Big Data” Science and Technology CenterMichael Stonebraker

  2. Context • Intel held a national “beauty contest” to locate their next S & T center • MIT won, with a “Big Data” proposal • 160 proposals • $2.5M per year for 3-5 years plus 5 Intel scientists • 20 PIs, half at MIT

  3. Big Data Means What? • Volume too large • Stupid analytics (i.e. SQL) • solved by commercial data warehouse products • Smart analytics (predictive modelling, machine learning, …) • Velocity too big • Drink from a firehose • Variety too large • Data integration problem • And what does this mean to computer architecture!

  4. Big Data Means What? • Volume too large – smart analytics • Array data bases • Parallel algo • Integration of linear algebra • Scalable vis • Velocity too big • Main memory DBs • And what does this mean to computer architecture! • Many core • Son-of-flash • Xeon Phi

  5. Array Data Bases • Elasticity in SciDB • Query optimizer for SciDB • Genomics benchmark • Run on SciDB, SciDB +Phi, column stores, row stores, MadLib, Hadoop • Graphs as sparse arrays • EarthDB

  6. Scalable Algo • Parallelizing locality sensitive hashing • Other algo people are going to work in other areas • Pick your favorite algo, parallelize and make scale • Scalable Julia

  7. Integration of Linear Algebra • Hardly anybody can beat BLAS/Lapack/Scalapack • 10 ** 5 difference between Python and Intel-optimized C++ • If you write operation X, chances are you will lose to Jack Dongarra by an order of magnitude • Don’t fight the wizard

  8. Integration of Linear Algebra • DBMS + Scalapack • Federation required • Resource manager required • Recoverable Scalapack required • Someday • A common storage format • Would make ACID much easier, …

  9. Visualization • Resolution reduction • Using “explain” • Choose the rendering automatically • Decision tree • Smart prefetch • Integrate with SciDB backend and Stanford visualizer front end

  10. High Velocity • Big pattern – little state • Find me a “banana” followed within 10 msec by a strawberry • Historically CEP • Big state – little pattern • Assemble my global real-time risk • Main memory DBMS

  11. High Velocity • Lots of commonality between CEP and MM DBMS • We are adding queues/windows to H-Store • It’s clear we will do ACID – CEP as fast as CEP • I predict the death of CEP

  12. High Velocity – Other Predictions • Death of Aries • Command logging much faster than data logging • Death of disk-oriented OLTP data bases • H-store with anti-caching is wildly faster than MySQL with or without MemcacheD • Trying an emulator for “son of flash” • Will make MM DBMSs even more attractive

  13. Many Core • 1000 cores will give major heartburn to all system software • Traditional DBMSs will collapse • DBMSs cannot have shared data structures • H-Store approach • Move the computation • Hardware-supported “move” • New concurrency control algorithms (revival of Dora?)

More Related