How Conviva used Spark to Speed Up Video Analytics by 25x Dilip Antony Joseph (@ DilipAntony )

How Conviva used Spark to Speed Up Video Analytics by 25x Dilip Antony Joseph (@DilipAntony)

Conviva monitors and optimizes online video for premium content providers We see 10s of millions of streams every day

Conviva data processing architecture Monitoring Dashboard Live data processing Reports Video Player Hadoop for historical data Ad-hoc analysis Spark

Group By queries dominate our workload • SELECTvideoName, COUNT(1) • FROM summaries • WHERE date='2011_12_12' AND customer='XYZ’ • GROUP BY videoName; • 10s of metrics, 10s of group bys • Hive scans data again and again from HDFS Slow • Conviva GeoReporttook ~ 24 hours using Hive

Group By queries can be easily written in Spark val sessions = sparkContext.sequenceFile[SessionSummary, NullWritable]( pathToSessionSummaryOnHdfs, classOf[SessionSummary], classOf[NullWritable]) .flatMap { case (key, val) => val.fieldsOfInterest } valcachedSessions = sessions.filter( whereConditionToFilterSessionsForTheDesiredDay) .cache valmapFn : SessionSummary => (String, Long) = { s => (s.videoName, 1) } valreduceFn : (Long, Long) => Long = { (a,b) => a+b } valresults = cachedSessions.map(mapFn).reduceByKey(reduceFn).collectAsMap

Spark is blazing fast! • Spark keeps sessions of interest in RAM • Repeated group by queries are very fast. • Spark-basedGeoReport runs in 45 minutes (compared to 24 hours with Hive)

Spark queries require more code, but are not too hard to write • Writing queries in Scala – There is a learning curve • Type-safety offered by Scala is a great boon • Code completion via Eclipse Scalaplugin • Complex queries are easier to write in Scala than in Hive. • Cascading IF()s in Hive

Challenges in using Spark • Learning Scala • Always on the bleeding edge – getting dependencies right • More tools required

Spark @ Conviva today • Using Spark for about 1 year • 30% of our reports use Spark, rest use Hive • Analytics portal with canned Spark/Hive jobs • More projects in progress • Anomaly detection • Interactive console to debug video quality issues • Near real-time analysis and decision making using Spark • Blog Entry: http://www.conviva.com/blog/engineering/using-spark-and-hive-to-process-bigdata-at-conviva

jobs@conviva.com We are Hiring

How Conviva used Spark to Speed Up Video Analytics by 25x Dilip Antony Joseph (@ DilipAntony )

How Conviva used Spark to Speed Up Video Analytics by 25x Dilip Antony Joseph (@ DilipAntony )

Presentation Transcript

How To Speed Up Metabolism

Using Spark @ Conviva Spark Summit 2013

Spark: High-Speed Analytics for Big Data

Using Spark @ Conviva 29 Aug 2013

Sharad Agarwal Venkat Padmanabhan Dilip Joseph

Video Analytics

by Antony Lishak

How to speed up your computer?

How to Speed Up My Computer

By Antony Spicer

How To Speed Up Magento Website

How to Speed up wordpress site

WordPress Speed Optimization - How to Speed Up WordPress Site

How to Speed Up Your Internet Connection?

How to Speed Up Your Results

How To Speed Up Your Toshiba Laptop?

How to Speed Up Metabolism

How to speed up windows 10 computer

Video Analytics by Acuiti Labs

How to Speed up SketchUp?

How to Speed up Your WordPress Site

How to Speed Up WordPress Website?