The Spark Debugger

Arthur Ankur Dave, MateiZaharia, Murphy McCauley,Scott Shenker, Ion Stoica The Spark Debugger UC BERKELEY

Motivation Debugging large parallel jobs is hard Current approaches to debugging: • Repeatedly modify and rerun the program • Run isolated code in Spark shell

Introducing Arthur Interactive replay debugger for Sparkprograms • Reconstruct and query intermediate datasets • Visualize the program’s data flow • Rerun any task in a single-process debugger • Trace records across transformations • Aggregate exceptions at the master

Spark Programming Model Example: Find how many Wikipedia articles match a search term HDFS file map(_.split(‘\t’)(3)) Resilient Distributed Datasets (RDDs) articles Deterministic transformations filter(_.contains( “Berkeley”)) matches count() 10,000

Approach lineage, checksums, events Master Workers Log results, checksums, events tasks

Approach Master Workers lineage Log user input results,checksums tasks

Detecting Nondeterministic Transformations Re-running a nondeterministic transformation may yield different results Arthur checksums RDD contents and alerts the user if necessary

Demo Example dataset: 1 GB partial Wikipedia dump • Reconstruct and query intermediate datasets • Visualize the program’s data flow • Rerun any task in a single-process debugger

Record Tracing Example: query a databaseof users and groups HDFS file A HDFS file B map(_.split(‘\t’)) map(_.split(‘\t’)) users groups join() groupCounts

Performance Event logging introduces minimal overhead

Future Plans • More analyses like backward tracing and culprit detection • Profiling tools for GC and memory • Real bugs

Arthur is in development at https://github.com/mesos/spark, branch arthur Documentation: https://github.com/mesos/spark/wiki/Spark-Debugger Ankur Dave ankurd@eecs.berkeley.edu http://ankurdave.com

The Spark Debugger

The Spark Debugger

Presentation Transcript

Rr0d: The Rasta Ring0 Debugger

iSeries GUI Debugger

Helikaon Linux Debugger:

The Spark

The Spark

The Spark

Debugger

TotalView Debugger

Spark Debugger

Rr0d: The Rasta Ring0 Debugger

The Perl Debugger

Lab 1 – Learning the Debugger

The debugger

Debugger?

The Spark

Debugger

Exploring Objects with the Debugger

The Debugger and Inspector

Debugger