110 likes | 438 Views
A introduction to Apache Spark MLlib, what is it and how does it work ? What can it do ?
E N D
Apache Spark MLlib • What is Apache Spark ? • What is MLlib ? • Functionality • Dependencies • Books • Eco-system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark – What is it ? • Alternative to Map Reduce for certain applications • A low latency cluster computing system • For very large data sets • May be 100 times faster than Map Reduce • Used with Hadoop / HDFS • Uses in memory cluster computing • Memory access faster than disk access • Has API's written in Scala / Java / Python www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – What is it ? • Spark Machine Learning Library • Provided with Spark Install • Code in Scala / Java / Python • Contain libraries • Spark.mllib • Spark.ml ( V1.2 ) • Provides common functionality • classification, regression, clustering • collaborative filtering, dimensionality reduction www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – Functionality • Basic Stats • Classification and regression • Collaborative Filtering • Clustering • Dimensionality reduction • Feature extraction and transformation • Optimization www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark MLlib – Dependencies • NumPy for Python • Breeze ( linear algebra ) • Netlib-java • Jblas • Gfortran runtime library www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Spark Eco system www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems