An introduction to Apache Spark

Apache Spark • What is it ? • How does it work ? • Benefits • Tuning • Examples www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Spark – What is it ? • Open Source • Alternative to Map Reduce for certain applications • A low latency cluster computing system • For very large data sets • May be 100 times faster than Map Reduce for • Iterative algorithms • Interactive data mining • Used with Hadoop / HDFS • Released under BSD License www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Spark – How does it work ? • Uses in memory cluster computing • Memory access faster than disk access • Has API's written in • Scala • Java • Python • Can be accessed from Scala and Python shells • Currently an Apache incubator project www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Spark – Benefits • Scales to very large clusters • Uses in memory processing for increased speed • High Level API's • Java, Scala, Python • Low latency shell access www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Spark – Tuning • Bottlenecks can occur in the cluster via • CPU, memory or network bandwidth • Tune data serialization method i.e. • Java ObjectOutputStream vs Kryo • Memory Tuning • Use primitive types • Set JVM Flags • Store objects in serialized form i.e. • RDD Persistence • MEMORY_ONLY_SER www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Spark – Examples Example from spark-project.org, Spark job in Scala. Showing a simple text count from a system log. /*** SimpleJob.scala ***/ import spark.SparkContext import SparkContext._ object SimpleJob { def main(args: Array[String]) { val logFile = "/var/log/syslog"// Should be some file on your system val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME", List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))‏ val logData = sc.textFile(logFile, 2).cache()‏ val numAs = logData.filter(line => line.contains("a")).count()‏ val numBs = logData.filter(line => line.contains("b")).count()‏ println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))‏ } } www.semtech-solutions.co.nz info@semtech-solutions.co.nz

Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems

An introduction to Apache Spark

An introduction to Apache Spark

Presentation Transcript

Using Apache Spark

Introduction to Apache Spark

An Overview of Apache Spark

Apache Spark

Introduction to Apache Spark

What is Apache Spark | Apache Spark Tutorial For Beginners | Apache Spark Training | Edureka

An introduction about the Apache Spark Framework

Apache Spark - Introduction

Introduction to Apache Spark

Apache Spark