270 likes | 440 Views
IIMC Long Duration Executive Education Executive Programme in Business Management Statistics for Managerial Decisions Advanced Statistical Inference. Prof. Saibal Chattopadhyay IIM Calcutta. A Brief Review. Uncertainty and Randomness: Theory of Probability
E N D
IIMC Long Duration Executive EducationExecutive Programme in Business ManagementStatistics for Managerial DecisionsAdvanced Statistical Inference Prof. Saibal Chattopadhyay IIM Calcutta
A Brief Review • Uncertainty and Randomness: Theory of Probability • Decision Making Under Uncertainty: Utility Theory • Random Variables & Probability Distributions: Binomial, Poisson, Normal, Exponential • Joint Distribution of Two Random Variables- Marginal Distributions, Mean, Variance, Covariance, Correlation Coefficient, Independence of random variables • Regression Approach to the analysis of a bivariate data – Curve fitting and Least Squares Principle • Sampling Theory: SRS, Stratified RS, Systematic Sampling, Central Limit Theorem, Multistage Sampling, Chi-Square, t and F distributions
Statistical Inference • Sample based Inference about a population • Estimation (Point and Interval) • Hypothesis Testing Characteristics of Interest: • Population Mean • Population SD • Population Proportion One sample problems: Mean (SD known or unknown; n large or small) Two Sample Problems: • Difference of two means • Ratio of Two SD’s • Difference of two proportions • Case Studies 1-5
Some other Inference problems • Categorical Data Analysis Variable is categorical in nature: Information available in terms of frequencies (number of individuals) belonging to different categories Example: 100 randomly selected items returned to a department store are categorized as: Cash Refund: 34 Credit to Charge Account: 18 Merchandise Exchange: 31 Return Refused: 17
Categorical Data Analysis Research Question: Are these four possible dispositions for a return request occur with equal frequency? • Need a hypothesis-testing to assess whether the Data (four frequencies: 34, 18, 31, 17) support the theory that probabilities for observations to fall in these four categories are all equal • P1, P2, P3, and P4 are these probabilities, with P1 + P2 + P3 + P4 = 1 • To test Ho: P1 = P2 = P3 = P4
Hypothesis-testing for categorical data • What is the alternative hypothesis? Ha: Not all Pi’s are equal How to proceed? With 2 such categories, no problem: the test is the equality of two proportions With multiple categories? • Goodness of fit tests for Ho versus Ha An extension for testing equality of proportions from several populations
Goodness-of-fit test General idea: • k categories • P1, P2, …, Pk: true unknown proportions for these k categories; P1 + P2 +…+ Pk = 1 • Ho: P1 = P1o; P2 = P2o; … Pk=Pko • Ha: Ho not true; at least one Pi differs from the corresponding hypothesized value • Level of significance = = 0.05 or 0.01 • Data given: Observed frequencies f1, f2, …, fk for these k categories; f1 + f2 + …+ fk = n = sample size
Goodness-of-fit test • Calculate the ‘expected frequencies’ for these k categories if Ho is true; Under Ho, Expected Frequency = Probability*Sample Size • fe1 = n.P1o; fe2 = n.P2o; … ; fek = n.Pko • fe1 + fe2 + … + fek = n = total frequency • Examine how closely these correspond to the actual observed frequencies • If they match closely, accept Ho • Reject Ho otherwise
Goodness-of-fit test How to judge: Test Statistic? 2 = (obs. freq. – exp. freq.)2 /(exp. freq.) = (fi – fei)2 /(fei) A Chi-square based on frequencies, both observed and expected (under Ho) • A Frequency Chi-Square Test • Distribution of this Chi-square? • Approximately Chi-square with (k-1) d.f. provided all expected frequencies are ‘large’ • How large: all fei 5
Goodness-of-fit Chi-Square Test • If Ho is true, discrepancies are small and so Chi-Square value is ‘small’ • Reject Ho if 2 is ‘large’: 2 > C • How large is large? Use level = 0.05 or 0.01 • 2 : upper -point of 2 (d.f = k –1): Table Back to the Example: • k = 4 (number of categories) • Ho: P1 = P2 = P3 = P4 = ¼ ; Ha: Not Ho • Obs. Freq: f1 = 34, f2 = 18, f3 = 31, f4 = 17 • N = total frequency = 100
Goodness-of-fit Chi-Square • Expected Frequencies: fe1 = 100. ¼ = 25 = fe2 = fe3 = fe4 • 2 = (34 – 25)2/25 + (18 – 25)2/25 + (31 – 25)2/25 + (17 – 25)2/25 = 9.2 • Suppose = 0.05 ( to test at 5% level) • 2 value from table (d.f = k –1 =3) = 7.815 • Observed 2 = 9.2 > 7.815 : Reject Ho • Return of merchandise not equally frequent over the different categories, at 5% level
Another Application – Test of Homogeneity • 2 or more similarly classified populations • Data: Frequencies falling in each category are known from each population • To Test if the populations are identical 2 populations - K classes each P1, P2, …, Pk : Probabilities for Population1 P1*, P2*, … Pk*: Prob. For Population 2 Ho: P1=P1*, P2 =P2*, …, Pk=Pk* Ha: They are not all equal
Case Study 6: Right of Advertising • A study of consumers and dentists attitude toward advertising of dental services “Should Dentists Advertise?” - Journal of Advertising Research, June 1982, 33-38. Two samples: 101 consumers (population1) & 124 dentists (population 2) were asked to respond to the following statement: “I favour the use of advertising by dentists to attract new patients” Possible Responses are: (strongly agree, agree, neutral, disagree, strongly disagree):
Should Dentists Advertise? • Data table
Should Dentists Advertise? Research Question: Are the two groups - consumers and dentists – differ in their attitudes toward advertising? Probability Table:
Should Dentists Advertise? To Test Ho: P1=P1*, …, P5 = P5* Expected Cell Count Formula: Exp = (Row marginal total)(Col. Marginal total) Grand Total Chi-sq = (obs. freq. – exp. freq.)2 /(exp. freq.) DF = (# Rows – 1) (#Columns –1) Reject Ho if observed Chi-sq >tabled Chi-sq. (Assumption: all expected frequency 5)
Should Dentists Advertise? Table of observed (expected) counts:
Should Dentists Advertise? Calculation of the Test Statistic: Here all expected frequencies are 5. Chi-sq = (34 – 19.3)2 + … + (46 – 28.11)2 19.30 28.11 = 84.47 Degrees of freedom = (2-1)(5-1) = 4 Use alpha = 0.05 Chi-sq from table = 9.488 Reject Ho if obs. Chi-sq > 9.488
Should Dentists Advertise? Conclusion: Since obs. Value of Chi-sq = 84.47 > 9.488, we shall reject Ho at 5% level of significance. Thus in the light of the given data, it appears that the two groups (consumers and doctors) differ significantly in their attitudes toward advertising.
A Test for Independence • Two attributes A and B • A has k levels A1, A2, …, Ak • B has l levels B1, B2, …, Bl • Data available on k.l level combinations fij = number of observations (frequency) belonging to (Ai, Bj), n = total frequency • To test Ho: A and B are independent • Alternative Ha: they are associated
Case Study 7: TV viewing and Fitness “Television viewing and Physical fitness in adults”: Research Quarterly for Exercise and Sport (1990), 315-320. A: Physical Fitness has k=2 levels A1=physically fit, A2=not physically fit B: TV viewing time (in hours per day, rounded to the nearest hour) has l=4 levels B1= 0, B2= (1-2), B3= (3-4), B4 =(5 or more)
TV viewing and Physical Fitness • Data available on 1200 adult males surveyed gave the following counts:
TV viewing and Physical Fitness Ho: TV viewing and Physical fitness are independent attributes Ha: They are associated Expected Cell Counts under Ho: (Row total)(Column Total) Total Frequency Chi-sq = (obs. – exp.)2 / exp Degrees of freedom = (k-1)(l-1) Reject Ho if observed Chi-sq > Tabled Chi-sq.
TV viewing and Physical Fitness Table of Observed (Expected) Frequencies
TV viewing and Physical Fitness • All expected frequencies are 5; so we may use the goodness-of-fit chi-square Degrees of Freedom = (2-1)(4-1) = 3 Chi-sq = (35 – 25.5)2 + … + (34 – 32.7)2 25.5 32.7 = 6.13 At 5% level, tabled Chi-sq = 7.815 Decision Rule: Reject Ho if Chi-sq > 7.815
TV Viewing and Physical Fitness • Conclusion: Since Observed Chi-sq = 6.13 is less than tabled value 7.815, we fail to reject Ho at 5% level. This means that in the light of the given data, it appears that Physical Fitness and TV viewing are independent of each other.
References Text Book for the Course • Statistical Methods in Business and Social Sciences: Shenoy, G.V. & Pant, M. (Macmillan India Limited) Suggested Reading • Complete Business Statistics: Aczel, A.D. & Sounderpandian, J. – Fifth Edition (Tata McGraw-Hill)