330 likes | 341 Views
Learn about the quality of secondary data, different types of measures, and how to test hypotheses using chi-square in market intelligence.
E N D
Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market IntelligenceJulie Edell Britton Session 3 August 21, 2009
Today’s Agenda • Announcements • Secondary data quality • Measure types • Hypothesis Testing and Chi-Square
Announcements • National Insurance Case for Sat. 8/22 • Stephen will do a tutorial today, Friday, 8/21 from 1:00 -2:15 in the MBA PC Lab and be available tonight from 7 – 9 pm in the MBA PC Lab to answer questions • Submit slides by 8:00 am on Sat. 8/22 • 2 slides with your conclusions – you may add Appendices to support you conclusions 3
Primary vs. Secondary Data • Primary -- collected anew for current purposes • Secondary -- exists already, was collected for some other purpose • Finding Secondary Data Online @ Fuqua • http://library.fuqua.duke.edu
Evaluating Sources of Secondary Data • If you can’t find the source of a number, don’t use it. Look for further data. • Always give sources when writing a report. • Applies for Focus Group write-ups too • Be skeptical.
Secondary Data: Pros & Cons • Advantages • cheap • quick • often sufficient • there is a lot of data out there • Disadvantages • there is a lot of data out there • numbers sometimes conflict • categories may not fit your needs
Types of Secondary Data *IRI = Information Resources, Inc. (http://us.infores.com/)
Secondary Data Quality: KAD p. 120 & “What’s Behind the Numbers?” • Data consistent with other independent sources? • What are the classifications? Do they fit needs? • When were numbers collected? Obsolete? • Who collected the numbers? Bias, resources? • Why were the data collected? Self-interest? • How were the numbers generated? • Sample size • Sampling method • Measure type • Causality (MBA Marketing Timing & Internship)
Today’s Agenda • Announcements • Secondary data quality • Measure types • Hypothesis Testing and Chi-Square
Measure Types • Nominal: Unordered Categories • Male=1; Female = 2; • Ordinal: Ordered Categories, intervals can’t be assumed to be equal. • I-95 is east of I-85; I-80 is north of I-40; Preference data • Interval: Equally spaced categories, 0 is arbitrary and units arbitrary. • Fahrenheit temperature – each degree is equal, Attitudes • Ratio: Equally spaced categories, 0 on scale means 0 of underlying quantity. • $ Sales, Market Share
Today’s Agenda • Announcements • Southwestern Conquistador Beer Case • Backward Market Research • Secondary data quality • Measure types • Hypothesis Testing and Chi-Square
Cross Tabs of MBA Acceptance by Gender A. Raw Frequencies B. Cell Percentages
C. Row Percentages D. Column Percentages
Rule of Thumb • If a potential causal interpretation exists, make numbers add up to 100% at each level of the causal factor. • Above: it is possible that gender (row) causes or influences acceptance (column), but not that acceptance influences gender. Hence, row percentages (format C) would be desirable.
Hypothesis Formulation and Testing Hypothesis: What you believe the relationship is between the measures. Theory Empirical Evidence Beliefs Experience Here: Believe that acceptance is related to gender Null Hypothesis: Acceptance is not related to gender Logic of hypothesis testing: Negative Inference The null hypothesis will be rejected by showing that a given observation would be quite improbable, if the hypothesis was true. Want to see if we can reject the null.
Steps in Hypothesis Testing • State the hypothesis in Null and Alternative Form • Ho: There is no relationship between gender and MBA acceptance • Ha1: Gender and Acceptance are related (2-sided) • Ha2: Fewer Women are Accepted (1-sided) • Choose a test statistic • Construct a decision rule
Chi-Square Test • Used for nominal data, to compare the observed frequency of responses to what would be “expected” under the null hypothesis. • Two types of tests • Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists between the two variables • Goodness of fit test – Compare whether the data sampled is proportionate to some standard
Chi-Square Test With (r-1)*(c-1) degrees of freedom Expected number in cell i under independence Observed number in cell i number of columns number of cells number of rows = Column Proportion * Row Proportion * total number observed
MBA Acceptance Data Contingency A. Observed Frequencies B. Cell Percentages C. Expected Frequencies
Chi-Square Test With (r-1)*(c-1) degrees of freedom =(140-111)2/111 + (860-890)2/890 + (60-89)2/89 + (740-710)2/710 = 19.30 So? 3. Construct a decision rule
Decision Rule • Significance Level - • Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (r-1)*(c-1), so here that would be 1. When the number of cells is larger, we need a larger test statistic to reject the null. • Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 • Ha1: Gender and Acceptance are related (2-sided) Critical Value = 3.84 • Ha2: Fewer Women are Accepted (1-sided) Critical Value = 2.71 • Decision Rule: Reject the Ho if calculated Chi-sq value (19.3) > the test critical value (3.84) for Ha1 or (2.71) for Ha2 Probability of rejecting the Null Hypothesis, when it is true
Chi-Square Test • Used for nominal data, to compare the observed frequency of responses to what would be “expected” under some specific null hypothesis. • Two types of tests • Contingency (or Relationship) – tests if the variables are independent – i.e., no significant relationship exists • Goodness of fit test – Compare whether the data sampled is proportionate to some standard
Goodness of fit – Chi-Square Ho: Car Color Preferences have not shifted Ha: Car color Preferences have shifted Data Historic Distribution Expected # = Prob*n Red 680 30% 750 Green 520 25% 625 Black 675 25% 625 White 625 20% 500 Tot (n) 2500 Do we observe what we expected?
Chi-Square Test With (k-1) degrees of freedom =(680-750)2/750 + (520-625)2/625 + (675-625)2/625 + (625-500)2/500 = 59.42 So? 3. Construct a decision rule
Decision Rule • Significance Level - • Degrees of freedom - number of unconstrained data used in calculating a test statistic - for Chi Square it is (k-1), so here that would be 3. When the number of cells is larger, we need a larger test statistic to reject the null. • Two-tailed or One-tailed test – Significance tables are (unless otherwise specified) two tailed tables. Chi-Sq is on pg 517 • Ha: Preference have changed (2-sided) Critical Value = 7.81 • Decision Rule: Reject the Ho if calculated Chi-sq value (59.42) > • the test critical value (7.81). Probability of rejecting the Null Hypothesis, when it is true
Recap • Finding & Evaluating Secondary Data • Measure Types • permissible transformations • Meaningful statistics • Index #s • Crosstabs • Casting right direction • Chi-square statistic • Contingency Test • Goodness of Fit Test