1 / 25

Outline

Outline. Background Lessons and challenges presented Business -level Technical -level (by data mining lifecycle stages) Data collection Data warehouse construction Business intelligence Deployment. Background. Blue Martini Software

gpittman
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Background • Lessons and challenges presented • Business-level • Technical-level (by data mining lifecycle stages) • Data collection • Data warehouse construction • Business intelligence • Deployment

  2. Background • Blue Martini Software • From beginning, significant consideration was given to data transformation and analysis needs • Lessons from 1999-2003 • More than 20 clients • Durations from a few person-weeks to several person-months • Some are available as case studies • Sources of data • Customer registration and demographic information, web click streams, response to DM and email campaigns, orders places through a website, call center, or in-store POS systems • A few thousands records to more than 100 million records • Collected from a few months to several years

  3. Business lessons • By data mining lifecycle stages • Requirement gathering • Data collection • Data warehouse construction • Business intelligence • Deployment

  4. Requirement gathering lessons • Clients are often reluctant to list specific business questions • Whet the clients’ appetite by presenting preliminary findings • Push clients to ask characterization and strategic questions • “What is the distribution of males/females among those spending more than $500?” • “What characterize people who spend more than $500” • Challenges: developing methodology and best practices to help business people define appropriate questions

  5. Data collection • The system transparently collects • Every search and the number of results returned • Shopping carts events • Important events such as registration, initiation of checkout, and order confirmation • Any form field failure • Use’s local time zone, data for robot detection, color depth, screen resolution

  6. Data collection lessons • Collect the right data, up front • Integrate external events

  7. Data warehouse construction • Lessons • Automatic generation of Decision Support System database is appreciated • Challenges • Firewalls • Integration

  8. Business intelligence lessons • Expect the operational channels to be higher priority than decision support • Crawl, walk, run • Start from basic reporting • Train data analyst • Tell people the time, not how to build clocks • Define the terminology • Writing a good glossary and sharing the terms across reports is important

  9. Business intelligence challenges • Make it easier to map business questions to data transformations • Automate feature construction • Build comprehensible models • Experiment because correlation does not imply causality • Explain counter-intuitive insights • Assess the ROI (return on investment) of insights

  10. Deployment • Lessons • Share insights • Take action • Challenges • Have transformed data available for scoring

  11. Technical details (1) - data definition, collection, and preparation • Data collection and management • Data cleansing • Data processing

  12. Data collection and management • Lessons • Collect data at the right abstraction levels • Design forms with data mining in mind • Validate forms to ease data cleansing and analysis • Determine thresholds based on careful data analysis • Example: session timeout

  13. Data collection and management • Challenges • Sample at collection • Support slowly changing dimensions • Perform data warehouse updates effectively

  14. Data cleansing • Lessons • Audit the data

  15. Data cleansing • Challenges • Detect bots • Between 5% to 40% of visits are due to bots • Perform regular de-duping of customers and accounts • many-to-many relationship

  16. Data processing • Lessons • Support hierarchical attributes • Handle cyclical attributes • Support rich data transformations

  17. Data processing • Challenges • Support hierarchical supports • Handle “unknown” and “not applicable” attribute values • NULL

  18. Technical details (2) - Analysis • Understanding and enriching the data • Building models and identifying insights • Deploying models, acting upon the insights, and closing the loop • Empowering business users to conduct their own analysis

  19. Understanding and enriching the data • Lessons • Statistics • Distributions, min, max, mean, number of NULL and non-NULL • Weighted average • Visualization • Line chart, bar chart, scatter plot, heatmap, filter chart

  20. Building models and identifying insights • Lessons • Mine data at the right granularity levels • Handle leaks in predictive models • Leaks are attributes highly correlated with the target but not useful in practice as good predictors • Improve scalability • Build simple models first • Use data mining suites • Peel the oinion and validate results

  21. Sharing insights, deploying models, and closing the loop • Lessons • Represent models visually for better insights • Understand the importance of the deployment context • Creating actionable models and closing the gap

  22. Empowering business users to conduct their own analysis • Lessons • Share the results among business users via simple, easy to understand reports • Provide canned reports that can be run by business users by simply specifying values for a few parameters • Technically savvy business users might be comfortable designing their own investigations provided a simple user interface

  23. Empowering business users to conduct their own analysis • Challenges • Visualize models • Prune rules and associations • Analyze and measure long-term impact of changes

  24. Summary • Top three lessons • Integrate data collection into operations to support analytics and experimentation • Do not confuse yourself with the target user • Provide simple reports and visualizations before building more complex models • Top three challenges • The ability to translate business questions to the desired data transformations • Efficient algorithms whose output is comprehensible for business insight, and which can handle multiple data types • Integrated workflow

More Related