250 likes | 266 Views
Outline. Background Lessons and challenges presented Business -level Technical -level (by data mining lifecycle stages) Data collection Data warehouse construction Business intelligence Deployment. Background. Blue Martini Software
E N D
Outline • Background • Lessons and challenges presented • Business-level • Technical-level (by data mining lifecycle stages) • Data collection • Data warehouse construction • Business intelligence • Deployment
Background • Blue Martini Software • From beginning, significant consideration was given to data transformation and analysis needs • Lessons from 1999-2003 • More than 20 clients • Durations from a few person-weeks to several person-months • Some are available as case studies • Sources of data • Customer registration and demographic information, web click streams, response to DM and email campaigns, orders places through a website, call center, or in-store POS systems • A few thousands records to more than 100 million records • Collected from a few months to several years
Business lessons • By data mining lifecycle stages • Requirement gathering • Data collection • Data warehouse construction • Business intelligence • Deployment
Requirement gathering lessons • Clients are often reluctant to list specific business questions • Whet the clients’ appetite by presenting preliminary findings • Push clients to ask characterization and strategic questions • “What is the distribution of males/females among those spending more than $500?” • “What characterize people who spend more than $500” • Challenges: developing methodology and best practices to help business people define appropriate questions
Data collection • The system transparently collects • Every search and the number of results returned • Shopping carts events • Important events such as registration, initiation of checkout, and order confirmation • Any form field failure • Use’s local time zone, data for robot detection, color depth, screen resolution
Data collection lessons • Collect the right data, up front • Integrate external events
Data warehouse construction • Lessons • Automatic generation of Decision Support System database is appreciated • Challenges • Firewalls • Integration
Business intelligence lessons • Expect the operational channels to be higher priority than decision support • Crawl, walk, run • Start from basic reporting • Train data analyst • Tell people the time, not how to build clocks • Define the terminology • Writing a good glossary and sharing the terms across reports is important
Business intelligence challenges • Make it easier to map business questions to data transformations • Automate feature construction • Build comprehensible models • Experiment because correlation does not imply causality • Explain counter-intuitive insights • Assess the ROI (return on investment) of insights
Deployment • Lessons • Share insights • Take action • Challenges • Have transformed data available for scoring
Technical details (1) - data definition, collection, and preparation • Data collection and management • Data cleansing • Data processing
Data collection and management • Lessons • Collect data at the right abstraction levels • Design forms with data mining in mind • Validate forms to ease data cleansing and analysis • Determine thresholds based on careful data analysis • Example: session timeout
Data collection and management • Challenges • Sample at collection • Support slowly changing dimensions • Perform data warehouse updates effectively
Data cleansing • Lessons • Audit the data
Data cleansing • Challenges • Detect bots • Between 5% to 40% of visits are due to bots • Perform regular de-duping of customers and accounts • many-to-many relationship
Data processing • Lessons • Support hierarchical attributes • Handle cyclical attributes • Support rich data transformations
Data processing • Challenges • Support hierarchical supports • Handle “unknown” and “not applicable” attribute values • NULL
Technical details (2) - Analysis • Understanding and enriching the data • Building models and identifying insights • Deploying models, acting upon the insights, and closing the loop • Empowering business users to conduct their own analysis
Understanding and enriching the data • Lessons • Statistics • Distributions, min, max, mean, number of NULL and non-NULL • Weighted average • Visualization • Line chart, bar chart, scatter plot, heatmap, filter chart
Building models and identifying insights • Lessons • Mine data at the right granularity levels • Handle leaks in predictive models • Leaks are attributes highly correlated with the target but not useful in practice as good predictors • Improve scalability • Build simple models first • Use data mining suites • Peel the oinion and validate results
Sharing insights, deploying models, and closing the loop • Lessons • Represent models visually for better insights • Understand the importance of the deployment context • Creating actionable models and closing the gap
Empowering business users to conduct their own analysis • Lessons • Share the results among business users via simple, easy to understand reports • Provide canned reports that can be run by business users by simply specifying values for a few parameters • Technically savvy business users might be comfortable designing their own investigations provided a simple user interface
Empowering business users to conduct their own analysis • Challenges • Visualize models • Prune rules and associations • Analyze and measure long-term impact of changes
Summary • Top three lessons • Integrate data collection into operations to support analytics and experimentation • Do not confuse yourself with the target user • Provide simple reports and visualizations before building more complex models • Top three challenges • The ability to translate business questions to the desired data transformations • Efficient algorithms whose output is comprehensible for business insight, and which can handle multiple data types • Integrated workflow