540 likes | 683 Views
Watching Pigs Fly with the Netflix Hadoop Toolkit. Hadoop Summit 2013 San Jose, CA. Our Motivation. Data should be accessible, easy to discover, and easy to process for everyone. Our Users. Analysts. Engineers. Hadoop Platform as a Service. Hadoop Platform as a Service. S3.
E N D
Watching Pigs Fly with the Netflix Hadoop Toolkit Hadoop Summit 2013 San Jose, CA
Our Motivation Data should be accessible, easy to discover, and easy to process for everyone.
Our Users Analysts Engineers
HadoopPlatform as a Service Data Platform
Data Platform as a Service Ignite (A/B Test Analytics) Lipstick (Pig Workflow Visualization) Spock (Data Auditing) Sting (Adhoc Visualization) Looper (Backloading) Forklift (Data Movement) Genie (HadoopPaaS) Franklin (Metadata API) Event Service (Orchestration) Hadoop Other Processing S3
But, what makes good recommendations? Similarity Personalization
COLORS! Box art is colorful…
We’re Sorry COLORS! Box art is colorful…
Hadoop Platform as a Service RDS Redshift Cassandra Teradata S3
Data Platform as a Service Franklin (Metadata API) RDS Redshift Cassandra Teradata S3
Data Platform as a Service Franklin (Metadata API)
Whether your dataset is large or small, being able to visualize it makes it easier to explain.
Data Platform as a Service Sting (Adhoc Visualization) Franklin (Metadata API)
Sting • Allows users to cache the results of a genie job in memory • Sub second response to OLAP style operations (slicing, dicing, aggregations). • Adhoc / recurring schedule • Easy to use!
Hive Query Schema
Hemlock Grove Arrested Development House of Cards
House of Cards Macbeth
Toddlers & Tiaras Star Trek: Voyager
Big Data # of subscribers X # of titles = ???,000,…,000 (big data)
Data Platform as a Service Sting (Adhoc Visualization) Franklin (Metadata API) Lipstick
Lipstick • Allows users to visualize their data flow • Allows users to see common errors • Allows users to easily monitor their jobs • Empowers users to support themselves • Facilitates communication between infrastructure team and users
Overall Job Progress
Overall Job Progress Logical Plan
Records Loaded Logical Operator (map side) Map/Reduce Job Logical Operator (reduce side) Intermediate Row Count
Hadoop Counters
Common Problem #1 My Job has stalled.
Unoptimized/Optimized Logical Plan Toggle Dangling Operator
Common Problem #2 I didn’t get the data I was expecting
Common Problem #3 I don’t understand why my job failed.