180 likes | 339 Views
Learn how Python is becoming the glue that binds data science, how rapid integration empowers data scientists to combine new technologies, and the two primary goals in store for Anaconda.
E N D
GPU Computing with Python and Anaconda: The Next Frontier Accelerate. Connect. Empower. Stan Seibert Director of Community Innovation © 2017 Anaconda, Inc. - Confidential & Proprietary
GPUs & Python: A Great Combination • Python is becoming the glue that binds data science • Rapid integration empowers data scientists to combine new technologies • This is our goal for Anaconda: • Free distribution of Python and R for Win/Mac/Linux • Includes GPU-accelerated packages: Caffe, TensorFlow, PyTorch, Theano, Numba, Pyculib... 2 © 2017 Anaconda, Inc. - Confidential & Proprietary
Deep Learning: An Early Success • Powerful machine learning technique • Many great open source options • Every major package has a Python interface • Very compute intensive ➡Perfect for GPU acceleration ReLU ReLU ReLU ReLU 3 © 2017 Anaconda, Inc. - Confidential & Proprietary
Numba: JIT Python Compilation • Compile numerical Python functions for CPU or GPU • Based on the LLVM compiler library • Great for rapid, custom algorithm development 4 © 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos? ETL/Data Prep Machine Learning Data Data Database Data Data Visualization GPU © 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos? CPU transfer ETL/Data Prep Machine Learning Data Data CPU transfer CPU transfer Database Data Data Visualization GPU © 2017 Anaconda, Inc. - Confidential & Proprietary
Problem: An Ecosystem of Silos? CPU transfer ETL/Data Prep Machine Learning Data Data Why do GPU applications share data through slow CPU memory? CPU transfer CPU transfer Database Data Data Visualization GPU © 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Open Analytics Initiative Goal: Standardize data exchange between GPU analytics applications Current Members: MapD, Anaconda, H2O.ai, BlazingDB, Graphistry, Gunrock http://gpuopenanalytics.com/ © 2017 Anaconda, Inc. - Confidential & Proprietary
Streamlining the Data Science Pipeline Packed Array GDF Apache Arrow Python Data Transformation Generalized Linear Model GPU Database All data stays on the GPU 9 © 2017 Anaconda, Inc. - Confidential & Proprietary
GPU Dataframe (GDF) • A format for tabular data in GPU memory • Exchange GDF between different libraries • Move between processes using CUDA IPC • Based on Apache Arrow • Code in separate library • Work in progress to move functionality into Arrow project 10 © 2017 Anaconda, Inc. - Confidential & Proprietary
PyGDF: Python GPU Dataframes • A Python library of manipulating GPU Dataframes: • Create from NumPy arrays and Pandas Dataframes • Exchange between processes • Math operations • Sort, Filter, Join, Group By • Ideal for data manipulation and feature engineering stages between data source and machine learning • Not intended to replace dedicated database applications • Interoperates with our Python compiler for GPU: Numba 11 © 2017 Anaconda, Inc. - Confidential & Proprietary
PyGDF: Group By Performance GPU speedup become very large above 10 million elements Aggregation functions are extremely efficient on the GPU 12 © 2017 Anaconda, Inc. - Confidential & Proprietary
Dask: Distributed Computing • Scalable execution task graphs of task graphs from single computers to 1000+ node clusters • Scheduler is "resource aware" and can direct GPU tasks to nodes with appropriate hardware. Great for heterogeneous clusters! 13 © 2017 Anaconda, Inc. - Confidential & Proprietary
The Future • In flight: • Merger of common code into Apache Arrow GPU support • Node.js interface to GDF (Graphistry) • Dask GDF: Distributed GPU dataframe • Other potential future projects: • Tensor exchange between Python GPU libraries • GPU shared memory service (Plasma for GPU) • Can we improve the interaction of unified memory and IPC? • What do you want to see? 14 © 2017 Anaconda, Inc. - Confidential & Proprietary
Learn More GPU Open Analytics Website http://gpuopenanalytics.com GOAI Github Organization https://github.com/gpuopenanalytics/ GOAI Google Group https://groups.google.com/forum/#!forum/gpuopenanalytics © 2017 Anaconda, Inc. - Confidential & Proprietary