610 likes | 633 Views
Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing. Bhagi Narahari. Outline of Lecture. What and Why of Data Mining and KDD? Importance and Applications to E-commerce How ? Personalization personalized one-to-one business on the internet
E N D
Data Mining & Knowledge Discovery: Personalization Technologies for One to One Marketing Bhagi Narahari
Outline of Lecture • What and Why of Data Mining and KDD? • Importance and Applications to E-commerce • How ? • Personalization • personalized one-to-one business on the internet • Part I: Overview of Personalization • Part 2: The Data Mining Process
Predictive Modelling • A “black box” that makes predictions about the future based on information from the past and present Age balance How much will customer spend on next catalog order ? Model (Crystal ball?) income
What is Data Mining? • It is the exploration and analysis by automatic or semiautomatic means, of large quantities of data in order to discover meaningful patterns and rules.
Why now? (A historical perspective) • Because data is now available (wasn’t always) • Distributed sources • Technology evolution • Competition (do what you can to outdo)
Why DM? • CRM (Customer Relationship Management) - important success factor in E-commerce • price differentiation no longer enough • customer service more important • Links with suppliers already exist (B2B) - JIT, joint forecasting, planning, procurement • Current emphasis on links with customers - feedback, input in design, etc.
CRM • Identifying profitable customers • Better service for more valued customers • Retaining profitable customers • Getting a new customer costs a lot more than retaining an existing one • takes 5X to acquire new customers (Peppers&Rogers) • An increase from 75% to 80% in retention reduces costs by about 10% • Larger share of customer pool
CRM • Product differentiations based on “price” and “quality” are increasingly difficult • need to differentiate based on relationships • Increasingly sophisticated mass marketing increases probability of success • cost of mass marketing is driven down by internet (reach)
CRM • Goal: Positively interact with your customers and prospects • define customer segments • lights out execution of campaigns against segments • attribution and evaluation of responses
Personalization in Ecommerce • Positive: • much better chance of personalization • customer identification • tracking across visits and within visit • ability to do ‘what if’ experiments • Negative: • cost of switching is much less • is web based shopping good for ‘touchy feely’ things • price differentiation across geographies not easy
Personalization Customer Chain Product Discovery Customer Service & Support Product Evaluation Order Payment Terms Negotiation Order Placement Market Research Customer Service & Support Market Stimulation/ Education Order billing and payment management Producer Chain Terms Negotiations Order Receipt
B2C Personalization Objectives • Know the customer • profile - registration, cookies • Determine what the customer wants • Ask: Questionnaires • what is the incentive for truthfulness • Deduce: click streams, history, collaborative filtering (Amazon!!) • Deliver • Customize the look and feel • offer special promotions • offer customized products (Holy Grail)
Use of Personalization • In addition to storing and retrieving information on the individual’s profile “on the fly” • can also use mining software to analyze the information in the database to make recommendations or comments specific to the individual
Impact of Personalization • Customer relationship • Learn more about customers • learn and understand the why and how they prefer to do business with your organization • In tandem with tracking provides you with a tool to monitor your website • what works, what does’nt, what makes your audience “click”
Security and Privacy as Barrier to Personalization • Large number of customers concerned about personalization (double click!) • will they pay more to preserve privacy? • Some falsify info to preserve privacy • customers give more info to trusted site • need secure site with clear privacy policies stated at site
Personalization Give the customer his/her wants Know the Customer Identify Login Credit Card# Questionnaires Past history Click Streams Product selection& promotions Look &feel New Product Extrapolation from past Profile Predicting the wants Mapping to “peers” Extrapolation from peers (firefly.com)
Know the customer • Cookies • backlash (users do not trust them) • OPS: Open Profiling Standard • combined with eTrust certification • Registration • User certificates: logons • Key Question: • how do you know that this customer is same as that goes to your storefront • need standard warehouse techniques like address resolution, cred.card resolution etc.
Know the Customer:OPS • Two drivers • user should not retype again & again basic info • data is used in a trusted fashion (not leaked, other data not see etc.) by users • Two parts • Common data • demographics (country,zip,age,gender) • Contact (name, address, CreditCard…) • User agent preferences • Per-site Sections (can be shared across sites, if user allows)
What if no profile??? • Deduce • collect information: history of purchases, time spent on pages • ask questions (offer rewards) • combine with database marketing data • Predict behaviour • buy probabilities • build customer relationship • mining is key!
Personalization: Actions to take- Look and feel • Personalized pages • specific data • specific presentation and design • sent through various mediums • Manage Customers not products: 1-1 marketing • Strategy.com • deliver personalized pages • eg: stock portfolio, personal info including alarm, travel reservations • use different mediums • WAP enable phones (eg: Sprint PCS Web)
Storefront Personalization • Customers visit Store Website • Howard buys ties • Rob buys Baby Products • Ray buys toys • Amy buys clothes • Provide a view of the store to these customers • present them with what they are likely to buy? • Howard: ties, and men’s formal wear • Ray: Toys and gadgets • Rob: Infant, Toddler section • Amy: Women’s Clothes section
More Actions: Product Presentations & Promotions Basic Storefront Product Hierarchy Clothes Men’s Women’s Children’s Casuals Evening Shirts Pants Infants Kids Mary’s View John’s View
BroadVision.com • BroadVision One-to-One application • allows businesses to develop and manage personalized web sites • interactively profile each visitor and dynamically match info based on their profile and business rules specified by providers of site & services • users do not go through hoops finding relevant data
DM Terminology Rule Based Systems OLAP Data Marts ROLAP SQL Neural Networks Data Warehouse Data Stores Data Mining Genetic Algorithms
How? • Determine probability of buying as a function of customer attributes such as age, income, past buying patterns, .. • Target customers by ranking from highest to lowest probabilities • Other techniques: Decision Trees, Neural Networks, ….
KDD • Knowledge Discovery in Databases • It is the process of identifying valid, novel, potentially useful, and understandable patterns in data (Fayyad, Piatesky-Shapiro, and Smyth) • It involves data preparation, pattern extraction, knowledge evaluation, and refinement, in iteration
KDD • Data mining is a step in the KDD process that involves the application of certain algorithms to extract patterns • Steps in the KDD process: • Select Data • Data Cleansing and Pre-processing • Data Mining • Results interpretation • Implementation
Pre-processing in KDD • 80-90% of KDD process is spent here • Why? • Operational data is incomplete, inconsistent, in different formats across systems • DM techniques might require data in a specific format
Data Mining Problems • Classification/Segmentation • Binary (Yes/No) • Multiple Category (Large/Medium/Small) • Forecasting (how much) • Association Rule extraction (market basket analysis) • Sequence detection • balance increase -> missed payment -> default
Typical DM tasks • Prediction and Classification • Directed • Decision trees, Neural networks, memory based reasoning, logistic regression • Examples: • How many units will be sold on a given day? • What will be the stock price on a given day? • Will a customer buy the product or not?
DM tasks • Affinity grouping • Undirected • Which products go together naturally? • The beer-diaper syndrome? • Market basket analysis • Examples: • Which products peak in demand simultaneously?
DM tasks • Clustering task • Undirected • Segmenting into similar clusters • Different from classification • Examples • Customers with similar buying profiles • Products with similar demand patterns
DM success factors • Integration with data warehouses and DSS • Users should develop a good understanding of techniques • Recognize that these tools cannot automatically find patterns without being told what to do • Most methods now used are extensions of analytical methods that have been around for decades
Legal and Ethical Issues • Privacy concerns • becoming more important • will impact the way that data can be used and analyzed • ownership issues • European data laws have implications on US • Often data included in the data warehouse cannot legally be used in decision making process • Race, Gender, Age • Data contamination will become critical
Making Decisions Data Data Data Data Data Warehouse? Models Decisions
Data Warehouse • Bill Inmon: “A data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management decisions.” • is managed data that is situated after and outside the operational systems
Data Warehousing • Increasing need to find, summarize, and interpret large amounts of data effectively • Especially when data is distributed across many different databases • Transaction processing systems not easily accessible to other systems • Plus TP systems have time constraints
Enter the Data Warehouse • To deliver decision data to decision makers • by integrating data from various TPS to a single storage which can then • feed a range of decision support applications • through an OLAP interface!
Data Complications • Noise • Missing data • Transformation • numeric data • text • Need to differentiate between variables you can control and those you cannot • Actionable: size of discount, number of offers etc. • Non-actionable: age, income ..
Data Mining Techniques • Market Basket Analysis • Memory Based Reasoning • Cluster Detection • Link Analysis • Decision Trees and Rule Induction • Neural Networks • Genetic Algorithms • OLAP
OLAP: On Line Analytical Processing • While a data warehouse brings data together, OLAP lets you look at data and manipulate interactively • OLAP allows users to “slice and dice” data • Allows user to drill-down into detail data
Multidimensional Terminology • East, West, Central are input members of the Region dimension. Total Region is an output member of the Region dimension. Similarly, Nuts, Screws, Bolts, Washers, and Total are members of the Product dimension. • Variables are typically numerical measures like Sales, Costs, Profits, Expenses, and so forth. • Dimensions are roughly equivalent to Fields in a relational database. Cells are roughly equivalent to Records.
Steps in DW and OLAP Data Data Data Data Loader Data Converter Data Scrubber Data Transformer Data Warehouse OLAP Server OLAP Interface
Cluster Detection • Undirected data mining • Finds records that are similar to each other (clusters) • Clusters are found using geometric methods, statistical methods, and neural networks • Good way to start any analysis
Market Basket Analysis • Form of clustering used for finding items that occur together (in a transaction or market basket) • Likelihood of different products being purchased together as rules • Planning store layouts, limiting specials to one of the products in a set,...