1 / 11

Several traps of and solutions to big data analytics

Several traps of and solutions to big data analytics. Eric Zheng. Presentation at the Symposium on Big Data. April 25, 2014. Agenda. Three traps in big data analytics (BDA) My solutions My two cents on where BDA is trending toward.

eve
Download Presentation

Several traps of and solutions to big data analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Several traps of and solutions to big data analytics Eric Zheng Presentation at the Symposium on Big Data April 25, 2014

  2. Agenda • Three traps in big data analytics (BDA) • My solutions • My two cents on where BDA is trending toward

  3. Big Data Analytics Trap 1: Quantity of data does not mean quality • The Parable of Google Flu Trend (GFT), by David Lazer et al. at Science, March 14, 2014. • GFT uses search trends and social media data to predict flue trend since 2009 • A widely cited example of BDA • In Feb. 2013, GFT made headline for its poor predictive performance • Missed almost all big events (e.g. H1N1) • Doubles the error rate of the traditional method using surveillance lab reports • why? • Google extracted 50 millions searches, only matched 1152 with flu search. • Too little useful data (big data overfitssmall useful data) • Ignore the basic science: e.g. ignoring seasonal trends, events (basketball game) • User’s search behavior is not exogenous, it is endogenously cultivated by Google’s algorithm

  4. BDA Trap2 – Biased Big data? George Gallup (1901-84) • The Gallup story • In 1935, the presidential election was between FDR and Alf Landon • The then poll authority was “Digest” • It use a “big data approach” • sent out 10+ Million surveys with 3M+ responses • In October 1935, Gallup announced two predictions 1) FDR would win handily (60% vs 40%) 2) Digest would predict that Landon would win • Few days later, Digest announced its prediction: Landon wining by 55% • What went wrong? – Problematic data generating process! • Digest acquired addresses through telephone numbers and car registry • Big data generated a biased sample!

  5. Are we repeating the same mistake? – The under-reporting bias -- Dominance of Positiveness – Yelp and YouTube • The social media data we observe may represent a biased sample!

  6. BDA Trap3 – big data generated with different user intentions • Say so doesn’t mean so? • 网络水军 (water army)

  7. Intention mining

  8. My Solution – Modeling the Data Generating Process in BDA • Why do (or don’t) people contribute? • How do people form opinions? • What is the intention of the user saying so? Purchase Opinion Formation Expressed Opinions Economic Impact A reduced form view: establish the relationship among observables directly. A generative view: How is the data generated (DGP)? My research Focus of current BDA

  9. Media Coverage • http://www.consumerreports.org/cro/2013/09/online-ratings-services/index.htm • http://business.time.com/2013/09/21/guess-whos-getting-some-pretty-awful-reviews-user-review-sites/

  10. What we found in the Blockbuster’s online review system Without considering silent users Factoring in silent users Estimated reporting probability: negative 6%, neutral 23%, positive 32%

  11. My two cents on the three stages of BDA The Galileo Stage The Newton Stage The Einstein Stage Invented telescope, settled Geocentric or Heliocentric debate Build infrastructure, tools to see the patterns in the data, this is where we are currently for BDA Theory of relativity Imagine: two competitors both implement top-notch BDA applications Web 3.0, Strategy mining, smart business down the road? Law of gravity, law of motion, calculus Build theory to see through the patterns in the data. Data changes but theories are stable

More Related