Scala/Spark Review

6/27/2014 Small Scale Scala Spark Demo, DataRepos, Spark Executors Scala/Spark Review

Spark 2.9.3 partition //withoutTen( = {1, 10, 10, 2}) → = {1, 2, 0, 0} //withoutTen( = {10, 2, 10}) → = {2, 0, 0} //withoutTen( = {1, 99, 10}) → = {1, 99, 0} def withoutTen(nums:Array[Int]):Array[Int] = { //cant get span and dropWhile,takeWhile to work correctly in 2.9.3 nums.partition(_ == 10)._2.padTo(nums.size,0) }

Partition to split into 2 arrays/lists scala> a res37: Array[Int] = Array(1, 2, 3, 10, 10, 10, 1) scala> a.partition(_ == 10) res38: (Array[Int], Array[Int]) = (Array(10, 10, 10),Array(1, 2, 3, 1))

Span should do the same thing • scala> a.span( _ ==10) • res44: (Array[Int], Array[Int]) = (Array(),Array(1, 2, 3, 10, 10, 10, 1)) • Works for first element only • scala> a.span( _ ==1) • res45: (Array[Int], Array[Int]) = (Array(1),Array(2, 3, 10, 10, 10, 1)) • takeWhile/dropWhile

Using fold/reduce for accumulation • foldLeft/foldRight binary operator to add not ( _ + _ ) on webpost • def add(res:Int,acc:Int)={println(“res:”+res+” acc:“+acc) res+acc} • Val a = Array(1,2,3,10,10,10,1) • Add if statement to add:scala> def add(res:Int, x:Int)={ println("res:"+res+" acc:"+acc);if(acc%2==0)res+acc else res} • add: (res: Int, acc: Int)Int • a.foldLeft(0)(add)

FoldLeft scala> a.foldLeft(0)(add) • res:0 x:1 • res:0 x:2 • res:2 x:3 • res:2 x:10 • res:12 x:10 • res:22 x:10 • res:32 x:1 • res51: Int = 32

reduceLeft scala> a.reduceLeft(add) • res:1 x:2 • res:3 x:3 • res:3 x:10 • res:13 x:10 • res:23 x:10 • res:33 x:1 • res49: Int = 33

Test using embeded fxns • Can't add logic to _ + _ in the same line • Limited to functions which return boolean, count or another data collection • ReduceRigtht, foldRight have to reverse arguments to add(acc:Int, res:Int)

Spark • CDK, MR parallelism vs Spark Executors in Mesos/YARN • Spark Job Server demo • Change Dependencies.scala • lazy val commonDeps = Seq(... • "org.apache.hadoop" % "hadoop-common" % "2.3.0", • "org.apache.hadoop" % "hadoop-client" % "2.3.0", • "org.apache.hadoop" % "hadoop-hdfs" % "2.3.0"

Test HDFS access HelloWorld.scala object HelloWorld extends SparkJob{ def main(args:Array[String]){ println("asdf") //wont see this val sc=new SparkContext("local[2]","HelloWorld") val config = ConfigFactory.parseString("") val results = runJob(sc,config) println("results:"+results) }

IPC error aused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 at org.apache.hadoop.ipc.Client.call(Client.java:1113) Wrong version of Hadoop Client libs

Validate, runJob def validate(sc:SparkContext,config:Config):SparkJobValidation = { Try(config.getString("input.string")).map(x=>SparkJobValid).getOrElse(SparkJ$ } override def runJob(sc:SparkContext, config:Config):Any= { val dd = sc.textFile("hdfs://localhost:8020/user/dc/books") dd.count() }

Results Test in spark shell first, count num lines hdfs file books 1) sbt package to create a jar 2) start the spark job server >re-start Verify you see a ui at localhost:8090 3) load the jar you packaged in 1) [dc@localhost spark-jobserver-master]$ curl --data-binary @job-server-tests/target/job-server-tests-0.3.1.jar localhost:8090/jars/test OK

Jobserver Hadoop HelloWorld 4) run jar in Spark [dc@localhost spark-jobserver-master]$ curl -d "input.string = a a a a a a a b b" 'localhost:8090/jobs?appName=test&classPath=spark.jobserver.HelloWorld' { "status": "STARTED", "result": { "jobId": "ce208815-f445-4a77-866c-0be46fdd5df9", "context": "70b92cb1-spark.jobserver.HelloWorld" } }

Query JobServer for results [dc@localhost spark-jobserver-master]$ curl localhost:8090/jobs/ce208815-f445-4a77-866c-0be46fdd5df9 { "status": "OK", "result": 5 }[dc@localhost spark

Scala/Spark Review

Scala/Spark Review

Presentation Transcript

End of Semester Review

Ignition System Diagnostics

Welcome

Review

Midterm review

Unit 4 Review

高效的 Scala

Comprehensive Review

Spark Will Not Be Stamped!

Mid-term Review Chapters 2-7

Readiness Review Course

AP U.S. Gov ’ t Review

VoiceStack Review & GIANT bonus packs

VoiceStack review-$26,800 bonus & discount

Origin Builder review - A top notch weapon

WP SwiftStart review and (MEGA) bonuses – WP SwiftStart

WP Blazer 3.0 review & bonus - I was Shocked!

Reselling Whizkid review demo and premium bonus