Optimization with Big Data

= Extreme* Mountain Climbing Optimization with Big Data * in a billion dimensional space on a foggy day Peter Richtarik School of Mathematics

BIG DATA BIG Volume BIG Velocity BIG Variety BIG Volume BIG Velocity BIG Variety • digital images & videos • transaction records • government records • health records • defence • internet activity (social media, wikipedia, ...) • scientific measurements (physics, climate models, ...) Sources

Arup (Truss Topology Design) Western General Hospital (Creutzfeldt-Jakob Disease) Ministry of Defence dstl lab (Algorithms for Data Simplicity) Royal Observatory (Optimal Planet Growth)

GOD’S Algorithm = Teleportation

If you are not a God... x2 x3 x0 x1

Optimization as Lock Breaking A number representing the “quality” of a combination x =(x1, x2, x3, x4) F(x) = F(x1, x2, x3, x4) Setup: Combination maximizing F opens the lock Optimization Problem: Find combination maximizing F

Optimization Algorithm

How to Open a Lock with Billion Interconnected Dials? # variables/dials = n = 109 x1 x2 Assumption: F = F1 + F2 + ... + Fn ----------------------- Fjdepends on the neighbours of xjonly x4 xn x3 Example: F1 depends on x1,x2,x3 and x4 F2 depends on x1 andx2, ... F : RnR

Optimization Methods Computing Architectures Effectivity Efficiency Scalability Parallelism Distribution Asynchronicity Randomization • Multicore CPUs • GP GPU accelerators • Clusters / Clouds

Optimization Methods for Big Data • Randomized Coordinate Descent • P. R. and M. Takac: Parallel coordinate descent methods for big data optimization, ArXiv:1212.0873 [can solve a problem with 1 billion variables in 2 hours using 24 processors] • Stochastic (Sub) Gradient Descent • P. R. and M. Takac: Randomized lock-free methods for minimizing partially separable convex functions [can be applied to optimize an unknown function] • Both of the above M. Takac, A. Bijral, P. R. and N. Srebro: Mini-batch primal and dual methods for SVMs, ArXiv:1302.xxxx

Theory vs Reality

Parallel Coordinate Descent holy grail settle for this start

TOOLS Probability HPC Matrix Theory Machine Learning

Optimization with Big Data

Optimization with Big Data

Presentation Transcript

Big Data

Harnessing Big Data with Hadoop

Engineering BIG DATA with HADOOP

Transforming Big Data with D4M

Big Data

Big Data

Data Mining with Big data

Big Data

Computations with Big Image Data

Data Mining with Big Data

Big Data

Big Data

Big Data

Big Data Training | Big Data Courses | Big Data Online Courses

Big Data Big Data

Big Data Management with Hadoop

Big Data