110 likes | 124 Views
This presentation gives an overview of the Apache SystemML AI/ML project. It explains Apache SystemML AI/ML in terms of it's functionality, dependencies and how systemDS has been forked from it providing greater functionality. <br> <br>Links for further information and connecting<br><br>http://www.amazon.com/Michael-Frampton/e/B00NIQDOOM/<br><br>https://nz.linkedin.com/pub/mike-frampton/20/630/385<br><br>https://open-source-systems.blogspot.com/
E N D
What Is Apache SystemML ? ● A machine learning system ● Designed to scale to Spark / Hadoop clusters ● Open source / Apache 2 license ● Developed in Java ● Supports R-like and Python-like languages ● Which are designed to scale into the big data range ● Automatic optimization at scale for data and cluster
SystemML Execution Modes ● System ML supports multiple execution modes ● Including – Standalone – Spark Batch – Spark MLContext – Hadoop Batch – Java Machine Learning Connector (JMLC)
SystemML Dependencies ● System DS forked from ML 1.2 ● Current dependencies – Java 8+ – Scala 2.11+ – Python 2.7/3.5+ – Hadoop 2.6+ – Spark 2.1+
What Is Apache SystemDS ? ● Forked from Apache SystemML 1.2 in September 2018 ● Supports linear algebra programs over matrices ● Replaces the underlying data model and compiler ● Substantially extends the supported functionalities ● Supports the whole data science lifecycle – Data integration, cleaning – Feature engineering – Model training ●Over efficient ●Local and distributed ML – Deployment, serving
What Is Apache SystemDS ? ●R-like languages for – The data-science life cycle stages – Differing expertise levels ●High-level scripts are compiled into hybrid execution plans – For local, in-memory CPU / GPU operations – For distributed operations on Apache Spark ●Underlying data model are DataTensors – Tensors (multi-dimensional arrays) whose first dimension – May have a heterogeneous and nested schema
SystemDS Algorithms ●Descriptive Statistics – Univariate Statistics – Bivariate Statistics – Stratified Bivariate Statistics ● Classification – Multinomial Logistic Regression – Support Vector Machines ●Binary-Class Support Vector Machines ●Multi-Class Support Vector Machines – Naive Bayes – Decision Trees – Random Forests
SystemDS Algorithms ● Clustering – K-Means Clustering ● Regression – Linear Regression – Stepwise Linear Regression – Generalized Linear Models – Stepwise Generalized Linear Regression – Regression Scoring and Prediction ● Matrix Factorization – Principal Component Analysis – Matrix Completion via Alternating Minimizations
SystemDS Algorithms ● Survival Analysis – Kaplan-Meier Survival Analysis – Cox Proportional Hazard Regression Model ●Factorization Machines – Factorization Machine
SystemDS Deep Neural Nets ●Use SystemDS to implement deep neural networks – Specifying network in Keras format / invoke with Keras2DML API – Specifying network in Caffe format / invoke with Caffe2DML API – Use DML-bodied SystemDS-NN library ●Ease training compute resource issues with – Native BLAS (Basic Linear Algebra Subprograms) – SystemDS GPU backend
Available Books ● See “Big Data Made Easy” Apress Jan 2015 – See “Mastering Apache Spark” ● Packt Oct 2015 – See “Complete Guide to Open Source Big Data Stack ● “Apress Jan 2018” – ● Find the author on Amazon www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ – Connect on LinkedIn ● www.linkedin.com/in/mike-frampton-38563020 –
Connect ● Feel free to connect on LinkedIn –www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at open-source-systems.blogspot.com/ – ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration