Statistical Techniques

Statistical Techniques MAR 6648: Marketing Research February 1, 2011

Overview • We’ll talk about basic statistical tools • T-tests, crosstabs, and regression are useful tools • We’ll talk about what they can and can’t do • More sophisticated tools can give a deeper view of your customers • Conjoint analysis, cluster analysis, and factor analysis can help you understand who your customers are and what they like

A Quick Note on Data Analysis • Statistics are just one part of an argument • People are easily persuaded by numbers and statistics • The more complicated the analysis, the less likely it is to be challenged • The strongest challenge to many statistical arguments is not in how the data are analyzed, but in how the data are collected • Methodological expertise always trumps data analytic experience • Data analytic knowledge allows for more careful consideration of methodology

Really Basic: Comparing Groups • In marketing we often have a need to understand differences between groups • Segmentation • Are two or more segments really different along some dimension of behavior or attitude? • Experiments • Did the treatment work? • We need a systematic approach that allows us to say when two (or more) groups of customers, companies, markets, etc. really are different

Most Basic: t-tests • Do web shoppers pay a different price for cars than dealership shoppers? • Do a hypothesis test: Null Hypothesis: = Alternative: ≠

T-test Results • “Customers who bought their new vehicles on the Auto Online website report having paid less for their vehicles than did customers who purchased their vehicles at the dealership (Monline = $11,582 vs. Mdealer= $13,594), t(1398) = -6.14, p < .001).” • If the p-value of the test is “small” we reject the null hypothesis • Here “small” typically means less than 5% (p = .05) • Now try answering a different question: • Are customers who purchase a car online more likely to buy their next car online as well?

Understanding Associations • One of the most common questions in Marketing Research: • Are two (or more) variables associated? Customer type Subsequent transaction

Tools for Analyzing Associations • Cross tabulation • Only for two categorical variables • Easy to understand • Regression • Applies to any number of variables • Not necessarily categorical variables • Slightly harder to understand

Χ2-test for Association • We can do a statistical test here • The null hypothesis is that there is no association between method of first purchase and method of subsequent purchase • This means that the percentage of people their next car online is the same regardless of how they purchased their previous car • Again, if the p-value of the test is less than .05, we reject the null hypothesis

Intuition for Χ2-test • The Χ2-test is based on comparing the actual cell counts to what we would expect them to be if there was no association 1 2 = 333/500 = 154/500 3 4 We would expect the table to look like this if there was no association = 0.692*0.666 = 0.231*500

Intuition for Χ2-test • The Χ2-test is based on comparing the actual cell counts to what we would expect them to be if there was no association Actual Expected Conclusion: Based on this data, it looks like customers who purchase a car online are no more likely to buy their next car online than customers who bought their initial car from a dealer. Χ2 (1, N = 500) = 3.002, p = .08

Crosstabs • Crosstabs is a quick and easy tool for analyzing the association between two categorical variables • Caveats: • You find associations – not causations • An observed association may be driven by a third variable not captured in the analysis • In crosstabs we cannot control for other variables – we need regression for this • Warning: Be careful when cell counts are low. The test does not work well in this case (stats programs should tell you)

Key Points • T-tests: • Good for analyzing data with a continuous dependent variable and a 2-level categorical variable • Does not allow for a more complex design • Does not allow the analysis to control for the presence of another known variable • Crosstabs • An easy method for describing categorical data • Easily analyzed using simple non-parametric tests (e.g., chi-square) • Poorly suited for handling non-categorical data • But often unable to isolate causation in data

Regression • Regression analysis is widely used in Marketing Research • It can detect associations between variables • It can help make forecasts • It can test Marketing Mix models: Impact of marketing mix variables on sales • It can analyze results of experiments

Example: Minute Maid Sales • Imagine that you’ve been hired as a consultant for the Minute Maid Company • Before going for an important meeting with senior management, you have been asked to analyze the sales data for MM orange juice for the Southern California market • To assist in your deliberations, some data have become available from one of your key accounts (the largest grocery chain in the market)

Example: Minute Maid Sales • The database was collected from weekly store scanner data that captures information such as sales (# of cartons sold), price, and other promotion information for each product • Management is particularly interested in understanding how different pricing strategies affect sales

The data

Weekly Minute Maid Sales and Price

A Linear Sales Model • We wish to explain variation of sales as a function of price • Assume that sales and price are related as: St =β0 + β1Pt + εt • We have now assumed that sales in week t is a linear function of price plus a random component • We need to find β0 and β1

SPSS Regression output p-value t-statistic Standard errors of b0 and b1 ≈ uncertainty associated with b0,b1 b0 Test of H0: β1=0 Ha: β1 ≠0 b1 St =β0 + β1Pt + εt

What does this mean?

Key Points • Regression: • Generates a specific equation describing the relationship between a specific predictor (e.g., prices) and a specific outcome variable (e.g., sales) • The results can offer precise (if imperfect) prescriptions for managers

Example: Minute Maid Sales • We previously identified a relationship between Minute Maid prices and Minute Maid sales • Essentially, Sales = 1093 + (-377 x price) • This model seems a little simplistic • What about accounting for the behavior of competitors? • Regression is good at that too

St =β0 + β1Pmm + β2Ptp + β3Ptr + β4Psb + ε Sales = 289 + (-479 × MMprice) + (131 × TPprice) + (175 × TRprice) + (144 × SBprice)

These are dummy coded variables representing the presence or absence of specific product promotions in the OJ market. Question: Did our Minute Maid promotions positively influence sales? (controlling for the presence of other known variables)?

Multiple Regression Controlling for everything else, the advertisement was still effective. An ad increased sales by 202 units. (Now, given the cost of advertising, you can make a recommendation about whether advertising is a good idea.)

Multiple Regression What else can we learn? Tropicana Ads do not influence Minute Maid sales, but Store Brand ads do. It looks like ads generally decrease price sensitivities. (We would need to test interactions to learn more about it)

Multiple Regression • Conceptually, the procedure allows you to track multiple variables at once • Track the influence of competition • Control for exogenous factors (e.g., weather, seasonality, etc.) • Every added variable improves the fit of the model to the given data

Multiple Regression • Pitfalls: • That does not necessarily make it better at predicting the future. You can “overfit” the data • Bad things happen when the predictors are strongly related to each other • It intrinsically assumes that a linear model is a pretty good approximation • It often is • But not always…

Key Points • Regression not only helps make precise predictions, it can simultaneously account for multiple influences • In so doing, it gets much closer to causal inferences (and good market researchers are after causal inferences) • Nevertheless, regression is not a panacea, and should be used as a tool, not the only tool • Nothing fixes poor research design

Specialized Techniques • Research for segmentation decisions • Segmentation is an essential part of the marketing plan, but how do we actually find the segments • Demographics? • Sometimes useful, but demographics are often a poor predictor of behaviors and attitudes • Attitudes • Segment customers based on attitudinal info (e.g., “optimists vs. “pessimists”, “leaders” vs. “followers”) • Benefits • Segment customers based on benefits sought from product/service • Behavior • Segment customers based on similar behavior (e.g., “heavy users”, “light users”)

Cluster Analysis • Cluster analysis is a technique used to identify groups of ‘similar’ customers in a market (i.e., market segmentation). • If some customers are very similar to oneanother but different from other (groups of) customers, cluster analysis can help you identify these (multiple) segments. Brand Loyalty Price sensitivity

Cluster Analysis • What is it actually doing? • The algorithm measures the “distance” between every point and generates a solution which minimizes distances within a cluster and maximizes distances between clusters • Note that this language is very close to how you were taught to think about the attributes of good segmentation • What, exactly, is “distance”? • A rare literal example

Cluster Analysis: Baseball • Baseball batters attempt to hit balls to parts of the field without any defensive players. • Baseball coaches have seven players to distribute wherever they want on the field. • Despite this general flexibility, fielders are almost uniformly distributed in the same locations. • Is that where batted balls tend to land?

Let’s look at clustering of batted balls for a single player. Chase Utley

Example: Shopping Attitudes • V1: Shopping is fun • V2: Shopping is bad for your budget • V3: I combine shopping with eating out • V4: I try to get the best buys while shopping • V5: I don’t care about shopping • V6: You can save a lot of money by comparing prices

Example: Shopping • Cluster 1: _______________ • Cluster 2: _______________ • Cluster 3: _______________

Key Points • Cluster Analysis allows us to simplify across respondents • When used effectively, it can guide marketing strategy • Nevertheless, it is by no means pure computational science. Identifying and labeling clusters requires some interpretation • This is a strength (in flexibility) • And a weakness

Clusters versus Factors Factor Analysis V1 V2 V3 V4 V5 V20 ….. Data Cluster Analysis

Factor Analysis • Factor Analysis can be used for data reduction (i.e., to reduce the number of variables). • Factor analysis: Summarize the information contained in a larger number of variables into a smaller number of ‘factors’ without significant loss of information. • Data reduction is important when you need to measure “fuzzy” concepts like “love,” “trust,” or “satisfaction • Ask a series of questions that tap into the different components of the concept • Too many variables! Factor analysis can help to reduce this dimensionality problem

Factor Analysis: Intuition • Factor analysis assumes that the correlation between a large number of variables is due to them all being dependent on the same small number of “factors” • Example: Choice of movies • Suppose individuals choose movies based on two main attributes: • Plot/story line (A1) • Production quality (A2) • Each individual has a preference for A1 and A2

Example: Choice of Movies

Key Points • Factor Analysis allows us to simplify across measures • It helps hone in on large difficult concepts that a single item measures poorly • It has a set of guidelines for interpretation and use (e.g., Eigenvalues > 1, KMO > .6), but it is only slightly less flexible than Cluster Analysis

Key Points • Market Research data is often extremely bulky and complicated. We need tools simply to make it comprehensible • Cluster Analysis helps with complexity across consumers, Factor Analysis helps with complexity across measures, Perceptual maps can helpfully present this information • These analytic tools are well suited to basic strategic concerns • Identifying segments and matching them to preferences and brand perceptions • In combination they are even better • Use these tools carefully: Because there is room for interpretation, there is also room for clumsiness (or deceptiveness)

How do individuals form preferences over a large selection of different brands within a product category? . . . . . .

Think of different brands as different combinations of attributes! Engine Size HP Type #Doors Brand Price 2.5L 184 Sedan 4 BMW $27,800 4.0L 203 SUV 2 Ford $21,715 6.0L 316 SUV 4 Hummer $48,455 3.0L 215 Sedan 4 Lexus $29,435 . . . . . . . . . . . . . . . . . . . . . 2.4L 157 Sedan 4 Toyota $18,970

Attribute based approach • Think of a product (a certain car) as a bundle of attributes. • A consumer prefers a certain car, car A, to another, car B, because the attributes of car A are more appealing to the consumer than the attributes of car B. • Suppose we assume that consumers form preferences over brands implicitly by forming preferences for the attributes of which the brands consists. • So if we present certain lists of attributes the consumer can rank these.

Statistical Techniques