800 likes | 1.15k Views
Data Science 101. A Love Story. Agenda. Introduction to Data Science Who’s who in Data Science? That Data Science Life. [Case Study] How Spotify manages their data. [VM] The Data Science life at VaynerMedia. Conclusions. “If you can measure it, you can hack it.”. E -> A -> E.
E N D
Data Science 101 A Love Story
Agenda • Introductionto Data Science • Who’s who in Data Science? • That Data Science Life. • [Case Study] How Spotify manages their data. • [VM] The Data Science life at VaynerMedia. • Conclusions.
“If you can measure it, you can hack it.” E -> A -> E
We’re generating (and tracking) exponentially more data online than ever before.
I get it.“Big Data” is real, and Data Scientists are awesome.
But what is a Data Scientist? Who are they, andhow do they work with “Big Data”?
Angel has 2 mutual friends with Vikash.Tim has 20 mutual friends withVikash.If John is friends with Vikash, he might know Tim and his mutual friends.
This increased platform usage, making the experience on LinkedIn more valuable.
Active Users = selling point for LinkedIn when pitching to Brands.
Leg up to users looking for employment in the informal job market.
Big Data.Real Business objective.Simple Analysis.Valuable Data-driven Product.
VM analysts do the same thing, we just don’t use the same tools.
Google started downloading the entireinternet in the late 90s-early 00s.
Google created a better way to process Big Data. They created MapReduce.
Hadoop is an open sourced distributed file system technology built using MapReduce.
Hive is a data “warehouse” tool built to query Hadoop systems.
Querying this data also allows us to work on our data retrieval skills.
Less time cleaning data.Less time “fishing”.Less spreadsheets. BOOM.
Amazon Web Services makes computing data in the cloud easy and cheap.
AWS EMR (Hadoop) Spotify Client AdHocMapReduce Jobs Hive (data warehouse infrastructure; SQL-like syntax) PostgreSQL