Intel “Big Data” Science and Technology Center Michael Stonebraker

Intel “Big Data” Science and Technology CenterMichael Stonebraker

Context • Intel held a national “beauty contest” to locate their next S & T center • MIT won, with a “Big Data” proposal • 160 proposals • $2.5M per year for 3-5 years plus 5 Intel scientists • 20 PIs, half at MIT

Big Data Means What? • Volume too large • Stupid analytics (i.e. SQL) • solved by commercial data warehouse products • Smart analytics (predictive modelling, machine learning, …) • Velocity too big • Drink from a firehose • Variety too large • Data integration problem • And what does this mean to computer architecture!

Big Data Means What? • Volume too large – smart analytics • Array data bases • Parallel algo • Integration of linear algebra • Scalable vis • Velocity too big • Main memory DBs • And what does this mean to computer architecture! • Many core • Son-of-flash • Xeon Phi

Array Data Bases • Elasticity in SciDB • Query optimizer for SciDB • Genomics benchmark • Run on SciDB, SciDB +Phi, column stores, row stores, MadLib, Hadoop • Graphs as sparse arrays • EarthDB

Scalable Algo • Parallelizing locality sensitive hashing • Other algo people are going to work in other areas • Pick your favorite algo, parallelize and make scale • Scalable Julia

Integration of Linear Algebra • Hardly anybody can beat BLAS/Lapack/Scalapack • 10 ** 5 difference between Python and Intel-optimized C++ • If you write operation X, chances are you will lose to Jack Dongarra by an order of magnitude • Don’t fight the wizard

Integration of Linear Algebra • DBMS + Scalapack • Federation required • Resource manager required • Recoverable Scalapack required • Someday • A common storage format • Would make ACID much easier, …

Visualization • Resolution reduction • Using “explain” • Choose the rendering automatically • Decision tree • Smart prefetch • Integrate with SciDB backend and Stanford visualizer front end

High Velocity • Big pattern – little state • Find me a “banana” followed within 10 msec by a strawberry • Historically CEP • Big state – little pattern • Assemble my global real-time risk • Main memory DBMS

High Velocity • Lots of commonality between CEP and MM DBMS • We are adding queues/windows to H-Store • It’s clear we will do ACID – CEP as fast as CEP • I predict the death of CEP

High Velocity – Other Predictions • Death of Aries • Command logging much faster than data logging • Death of disk-oriented OLTP data bases • H-store with anti-caching is wildly faster than MySQL with or without MemcacheD • Trying an emulator for “son of flash” • Will make MM DBMSs even more attractive

Many Core • 1000 cores will give major heartburn to all system software • Traditional DBMSs will collapse • DBMSs cannot have shared data structures • H-Store approach • Move the computation • Hardware-supported “move” • New concurrency control algorithms (revival of Dora?)

Intel “Big Data” Science and Technology Center Michael Stonebraker

Intel “Big Data” Science and Technology Center Michael Stonebraker

Presentation Transcript

INTEGERS LINEAR ALGEBRA

Linear Algebra

Linear algebra: matrices

Numerical Linear Algebra

Linear Algebra – I

Linear Algebra Review

Numerical Linear Algebra

Linear Algebra

Linear Algebra

Little Linear Algebra

Linear Algebra

Secure Linear Algebra

Linear Algebra

Elementary Linear Algebra

Linear algebra

Linear Algebra

Dense Linear Algebra

Linear Algebra

Linear Algebra

Linear Algebra

MAT208 Linear Algebra

Linear Algebra Review