1 / 8

Massive Data: Impetus, Types, and Challenges

This symposium explores the impact of technological, computing, and methodological advances on the availability of massive data. It discusses different types of structured and unstructured data and addresses the challenges faced in analyzing and utilizing these large datasets. The symposium also highlights the need for real-time computation, decision optimization, and interdisciplinary collaborations in managing and modeling massive data.

tschiller
Download Presentation

Massive Data: Impetus, Types, and Challenges

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Massive Choice Data Co-Chairs: Prasad Naik and Michel Wedel 7th Triennial Choice Symposium Wharton Business School June 13 -17, 2007

  2. Impetus for “Massive” Data? • Technological advances (Internet, RFID) • Computing advances • Methodological advances • Detailed data • Large sample, N • Many variables, p • Long time-series, T • Several products and SKUs, K

  3. Different Types of Massive Data • Structured Data • Scanner panel, Loyalty card, CRM, Click-stream • Unstructured Data • Text data (e.g., product reviews, blogs, complaints) • Images, Music • Emerging Data Types • RFID, Video, social networks, recommendations, auctions, games, eye tracking, semantic Web 2.0

  4. Is the data set just getting bigger? • What is the qualitatively difference? • Sometimes Nothing • Just a scale up problem • But the bigger size makes it harder to analyze in real-time • Sometimes Everything • Empty space phenomenon • Statistical Inference, diagnostics, sparseness • Visualization becomes tricky when p > 10

  5. Managers and Models • Managers need • real-time computation • decision optimization • Man – Machine engagement • managerial inputs plus data analyses • Models need to be both • Simple  for quick computation (real-time decisions), • Complex  for realism in assumptions • How? • The notion of “Workbench” • Model averaging, forecast combination

  6. Estimation and Computation • Estimation methods • Identified promising approaches for massive data analysis • Inverse regression methods • Regularization techniques (e.g., Lasso) • Particle filters • Logistic regression or Support Vector Machines • Computation power • Grid computing is needed • waiting for fast computer is not an option • Gap between industry and practice • Google has 2 Million processors

  7. Directions and Action Points • Incentives for academics? • Industry-Academic partnerships • Cross-disciplinary collaborations

  8. Thank you for this forum to share ideas! Credits Lynd Bacon (LBA Inc) Anand Bodapati (UCLA) Wagner Kamakura (Duke) Jeffrey Kreulen (IBM Research) Peter Lenk (Michigan) David Madigan (Rutgers) Alan Montgomery (CMU)

More Related