1 / 31

BIG DATA , We have a communication problem.

BIG DATA , We have a communication problem. GINORMOUS SYSTEMS April 30–May 1, 2013 Washington, D.C. Daniel Tunkelang Head of Query Understanding, LinkedIn. BIG DATA IS EVERYWHERE. BIG DATA POWERS EVERYTHING. DATA SCIENTISTS WORRY ABOUT VOLUME, VELOCITY, VARIETY, …. BUT THE BOTTLENECK

Download Presentation

BIG DATA , We have a communication problem.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIG DATA,We have a communication problem. GINORMOUS SYSTEMS April 30–May 1, 2013 Washington, D.C. Daniel Tunkelang Head of Query Understanding, LinkedIn

  2. BIG DATA IS EVERYWHERE

  3. BIG DATA POWERS EVERYTHING

  4. DATA SCIENTISTS WORRY ABOUT VOLUME, VELOCITY, VARIETY, …

  5. BUT THE BOTTLENECK ISN’TCOMPUTATIONAL IT’S COGNITIVE

  6. BIG DATA IS A TOOL Doug Engelbart, inventor of the mouse, hypertext, etc. TOOLS AUGMENT HUMAN INTELLECT

  7. NOT EVERYONE SUBSCRIBES TO THIS POINT OF VIEW… Claudia Perlich, Chief Scientist of media6degrees, speaking at TTI/Vanguard 2012 Conference on Understanding Understanding:

  8. SHE HAS A POINT

  9. BUT PREDICTIVE MODELING ISNOT ENOUGH

  10. TRAINING DATA? OBJECTIVE FUNCTION?

  11. WE NEED A PEOPLE-CENTRIC APPROACH TO BIG DATA INTERPRETABILITY INTERACTION INSIGHT

  12. LET’S START WITH INTERPRETABILITY

  13. EXAMPLE: SVM vs. DECISION TREE

  14. DECISION TREES HAVE FLAWS… DISCRETE

  15. early splits provide big picture… …or reveal training data problems BUT THEY COMMUNICATE (if they’re shallow) fat leaves guide feature engineering

  16. WHICH SUPPORTS ITERATION

  17. INTERPRETABILITY DELIVERS • Key search leader favors rule-based approach for key scoring algorithms. • Replaced regression with decision tree in local search model: gained accuracy and insight. • Using trees to recognize spam, analyze search abandonment, model / quantify social proof.

  18. GO DEEP vs INTERPRETABILITY A KEY DATA SCIENCE TRADE-OFF

  19. ON TO INTERACTION

  20. DON’T OVERPAY FOR PRECISION

  21. BE FAST, CHEAP, AND 98% RIGHT http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/

  22. ARE PEOPLE THAT IMPATIENT? tolerable wait time for web users 0.1s increase in latency significantly reduces # of searches, ad revenue tl;dr: YES

  23. IMPATIENCE IS GOOD SPEED MATTERS

  24. INSIGHT

  25. http://blog.takejune.com/archives/52334044.html

  26. BE TRENDY AND NORMALIZE vs

  27. SOLVE FOR INTERESTINGNESS Sept. 11th Abu Ghraib Weapons Inspectors

  28. COMPUTE POTENTIAL INSIGHTS APPLY HUMAN INTUITION

  29. SUMMARY: Let’s have a conversation with Big Data. INTERPRETABILITY INTERACTION INSIGHT

More Related