1 / 0

Inferential Statistics: Hypothesis Testing

Inferential Statistics: Hypothesis Testing. Test of Categorical Data / Proportion. Inferential Statistics. Estimation Estimate population means Estimate population proportion Estimate population variance Hypothesis testing Testing population means Testing categorical data / proportion

dima
Download Presentation

Inferential Statistics: Hypothesis Testing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inferential Statistics:Hypothesis Testing

    Test of Categorical Data / Proportion
  2. Inferential Statistics Estimation Estimate population means Estimate population proportion Estimate population variance Hypothesis testing Testing population means Testing categorical data / proportion Testing population variances Hypothesis about many population means One-way ANOVA Two-way ANOVA
  3. Tests for Categorical Data Test the interested proportion in population e.g. Proportion of defect in production Proportion of people travelling by Skytrain in BKK Test if the proportion in population is as expected using data collected from sample
  4. Testing Categorical Data or Proportion Binomial proportion Single population Two populations Multiple groups proportion : Chi-square Test Single population Test of homogeneity Goodness of fit test Two Populations Test of Homogeneity Test of Independence
  5. Binomial Proportion The sample is categorized into two groups. Single population Determining if the proportion in one of two categories is different from a specified proportion Two populations Compare the difference between the proportions of two populations Steps are similar to testing of population mean Assumptions Normal distribution of population proportion (or proportion difference in the case of two populations) Number of sample (of both populations in the case of two populations) is sufficiently large (n ≥ 30)
  6. Binomial – Single Population
  7. Example 1 A plastic product factory take a sample of 400 plastic containers, 12 of which are defective. From this data, test if the proportion of defect is more than 2% at significant level 0.05. Hypotheses α = 0.05
  8. Example 1 Calculate test statistic z-score from table: z0.05 = 1.645 The calculated z-score is 1.429 < 1.645, not falling in right-tailed critical region. Accept H0 and reject H1 The proportion of defect is not more than 2% at significant level 0.05
  9. Example 2 From the observation of students wearing and not wearing safety helmet when riding motorcycles, among 500 sample students, 75 students wear helmet. Can a conclusion be drawn that the proportion of students wearing helmet is less than 20% at significant level 0.01? Hypothesis H0: P ≥ 0.2 H1: P < 0.2 α = 0.01
  10. Example 2 Calculate test statistic z-score from table: z0.01 = 2.326 The calculated z-score is -2.80 < -2.326, falling in left-tailed critical region. Reject H0 and accept H1 The proportion of students wearing helmet is less than 20% at significant level 0.01
  11. Extra example A shampoo company expects that after advertising the new product for 2 months, the product will be popular among 60% of consumers. Thus, after the advertisement period, 300 bottles are given out to 300 sample consumers, 220 of which respond positively. Test if the assumption is true at significant level 0.05.
  12. Extra example From a sample of 90 students, 28 students have private cars. Test if the proportion of the students having private cars is more than 25% at significant level 0.05.
  13. Binomial – Two Populations
  14. Binomial – Two Populations
  15. Binomial – Two Populations If d0 = 0 e.g. H0: P1 = P2 or P1 – P2 = 0 Pooled estimated proportion x1, x2 : numbers of interested in the first and second samples n1, n2 : sizes of the first and second samples Additional assumption
  16. Binomial – Two Populations If d0 = 0 e.g. H0: P1 = P2 or P1 – P2 = 0 Estimated variance of proportion difference Z-score calculation adjusted to
  17. Example 1 From a survey, among 100 IT students, 70 have a smart phone. And among 150 art students, 72 have a smart phone. Test if the proportion of IT students who have a smart phone is 10% higher than that of art students at significant level 0.05. Hypothesis α = 0.05
  18. Example 1 Calculate test statistic z-score from table: z0.05 = 1.645 The calculated z-score is 1.956 > 1.645, falling in right-tailed critical region. Reject H0 and accept H1 The proportion of IT students who have a smart phone is 10% higher than that of art students at significant level 0.05
  19. Example 2 From a survey, among 200 university students, 120 have a notebook computer. And among 500 high school students, 240 have a notebook computer. Test if the proportion of university students who have a notebook computer is higher than that of high school students at significant level 0.025. Hypothesis *d0 = 0 α = 0.025
  20. Example 2 Calculate pooled estimated proportion n1p = 200*0.51 = 102 n1q = 200*0.49 = 98 n2p = 500*0.51 = 255 n2q = 500*0.49 = 245 Calculate test statistic
  21. Example 2 z-score from table: z0.025 = 1.96 The calculated z-score is 2.9 > 1.96, falling in right-tailed critical region. Reject H0 and accept H1 The proportion of university students who have a notebook computer is higher than that of high school students at significant level 0.025
  22. Example 3 From the previous observation of students wearing and not wearing safety helmet when riding motorcycles, the 500 sample students are grouped by gender as shown in the table. Can a conclusion be drawn that the proportion of female students wearing helmet is higher than the male counterpart at significant level 0.05?
  23. Example 3 Hypothesis H0: Pf ≤ Pm H1: Pf > Pm
  24. Example 3 Calculate test statistic z-score from table: z0.05 = 1.645 The calculated z-score is 1.28 < 1.645, not falling in right-tailed critical region. Accept H0 and reject H1 The proportion of female students wearing helmet is not higher than male at significant level 0.05
  25. Extra Example According to a polio vaccination program in a school, 16 out of 100 vaccinated female students are infected, and 20 out of 200 vaccinated male students are infected. Test if the proportion of the infected female students is 5% higher than the proportion of the infected male students at significant level 0.10.
  26. Multiple Groups Proportion Categorical data cannot be measured in terms of number but can be grouped e.g. 5-rating scale, religion, occupation, and gender The data of each group is then frequency, which can be tested using Chi-square test (χ2) Determine if the observed proportion of groups is different from a specified expected ratio
  27. Multiple Groups Proportion Assumptions Sample size must be sufficiently large: 4-5 times the number of groups The frequency of each group must not be less than 5. If exist, combine that group with an adjacent group (reducing degree of freedom) Cannot be applied to repeated measures design Measuring the same sample after a time period e.g. measuring the effect of a drug after the 1st, 2nd, and 3rd hour. Measuring the same variable after changing treatment e.g. measuring blood pressure of the same sample after administering different drug dosages.
  28. Limitation If the sample contain 2 groups (degree of freedom = 1) and total frequency is less than 50, Frank Yate suggested using Corrected Chi-square *If the total frequency is 50 or more, no need to use Corrected Chi-square But we leave this matter here
  29. Multiple Groups Proportion df = k-1-m Single variable Test of homogeneity Goodness of fit test Two variables Test of Homogeneity Test of Independence
  30. One Variable Test of Homogeneity Used to determine whether the proportion of two or more groups in a population is similar Hypothesis Oi: observed frequency in each group Ei: expected frequency in each group k: number of groups
  31. Chi-square Critical Region Acceptanceregion Rejectionregion Reject H0 when the calculated from table
  32. Example 1 In the teaching evaluation of a course, from the total of 200 students, 72 are very satisfied, 60 are satisfied, 22 are indifferent and 46 are unsatisfied. Is the proportion of the satisfaction levels similar at significant level 0.01? Hypothesis H0: Frequency of each satisfaction level is not different H1: Frequency of each satisfaction level is different
  33. Example 1 Calculate test statistic
  34. Example 1 Critical Chi-square Degree of freedom = k - 1 = 4 – 1 = 3 The calculated Chi-square is 27.68 > 11.34 falling in critical region Reject H0 and accept H1 The proportion of the satisfaction levels is not similar at significant level 0.01
  35. Extra Example A coffee bean reseller assumes that the sale proportion of 4 types of coffee beans are equal. 500 customers are sampled and the number of sale of each type of coffee bean is shown in the table. Test if the assumption is true at significant level 0.01.
  36. One Variable Goodness of Fit df = k-1-m Used to determine whether the proportion of two or more groups in a population fits a specified proportion Hypothesis Oi: observed frequency in each group Ei: expected frequency in each group Ei = npi; n = total freq, p = probability of distribution of the group k: number of groups m: number of parameters to be estimated (we only study non-parametric chi-square so ignore this)
  37. Example 1 A financial institute studies history of loan clients. It is found that 80% of the clients can return their loan in 1 year, 10% in 2 years, 6% in 3 years, and 4% in over 3 years. To assess the current situation, 400 recent loan clients are sampled, 287 of which can return their loan in 1 year, 49 in 2 years, 30 in 3 years, and 34 in over 3 years. Test if the clients’ ability to return loans changes.
  38. Example 1 Hypothesis H0: p1:p2:p3:p4 = 0.8: 0.1: 0.06: 0.04 H1: p1:p2:p3:p4 ≠ 0.8: 0.1: 0.06: 0.04 OR H0: Clients’ ability to return loan does not change H1: Clients’ ability to return loan changes α = 0.05 Calculate test statistic
  39. Example 1 Degree of freedom = 4-1 = 3 The calculated Chi-square is 27.178 > 7.81 falling in critical region Reject H0 and accept H1 Clients’ ability to return loan changes at significant level 0.05
  40. Extra Example In an exam of a sale training program with 150 participant, the manager expects that the proportion of the results, which is categorized in 3 groups: very good, good, and fair, will be 2:1:2. After the exam, the actual frequency in the 3 groups are 70, 30, and 50 participants respectively. Are the actual and the expected proportions different at significant level 0.05?
  41. Two Variables Chi-square Test of Homogeneity Used to determine whether the proportions of groups in a variable is similar when grouped by another variable Two or more groups in each variable H0: p1 = p2 = p3 = … = pn H1: p1 ≠ p2 ≠ p3 ≠ … ≠ pn E.g. proportion of occupations between three countries
  42. Two Variables Chi-square Test of Independence Used to determine whether the effects of one variable depend on the value of another variable (2 variables) H0: Variable x and variable y are independent of each other (are not related) H1: Variable x and variable y are dependent of each other (are related)
  43. Two Variables Chi-square Data is grouped in rows and columns of two-way table
  44. Two Variables Chi-square r: number of rows c: number of columns Oij: observed frequency of row i column j Eij: expected frequency of row i column j
  45. Expected Frequency
  46. Chi-square Critical Region Acceptanceregion Rejectionregion Reject H0 when the calculated from table
  47. Example: Test of Homogeneity According to a survey of 1200 sample individuals grouped by four occupations, the number of smokers and non-smokers are listed in the table. Test if the proportion in each occupation is different.
  48. Example: Test of Homogeneity Hypothesis H0: p1 = p2 = p3 = p4 H1: p1 ≠ p2 ≠ p3 ≠ p4 α = 0.05
  49. Example: Test of Homogeneity Calculated expected frequencies E11 = (300*233)/1200 = 58.25 E12 = (300*967)/1200 = 241.75 E21 = (250*233)/1200 = 48.54 E22 = (250*967)/1200 = 201.46 E31 = (300*233)/1200 = 58.25 E32 = (300*967)/1200 = 241.75 E41 = (350*233)/1200 = 67.96 E42 = (350*967)/1200 = 282.04
  50. Example: Test of Homogeneity Calculate test statistic
  51. Example: Test of Homogeneity Degree of freedom = (r-1)(c-1) = 3*1 = 3 The calculated Chi-square is 20.59 > 7.81 falling in critical region Reject H0 and accept H1 The proportion between smokers and non-smokers in each occupation is different at significant level 0.05
  52. Example: Test of Independence To test if the achievement score of a training program is related to the achievement score of the actual operation at significant level 0.01, 400 employees are sampled. The scores are listed in the table.
  53. Example: Test of Independence Hypothesis H0: score of the training program and the score of the actual operation are not related H1: score of the training program and the score of the actual operation are related α = 0.01
  54. Example: Test of Independence Calculated test statistic = 20.178
  55. Example: Test of Independence Degree of freedom = (r-1)(c-1) = 2*2 = 4 The calculated Chi-square is 20.178 > 13.28 falling in critical region Reject H0 and accept H1 The score of the training program and the score of the actual operation are related (or are dependent on each other) at significant level 0.01
  56. Extra Example 1 A factory manager believes that the efficiency of workers depends on how long they have worked in the factory. To test this belief, 100 sample products are inspected. The quality of the sample are listed in table. Test the belief at significant level 0.05.
  57. Extra Example 2 A toothpaste company wants to know if the color of the toothpaste is related to the gender of buyers. Sample of 500 male and 500 female are randomly selected to examine their favored toothpaste color. Test if the color of the toothpaste is related to the gender at significant level 0.01.
  58. Are they the same? Test of Homogeneity and Test of Independence use the same calculation Test of Homogeneity tells if the proportion is the same H0: Proportion is similar for all groups H1: Proportion not similar for some/all groups Test of Independence tells if two variables are dependent H0: Two variables are independent H1: Two variables are dependent
  59. Are they the same? Consider this The proportion of selected major is the same for any gender That means no matter the gender, the proportions remain the same That means gender has no effect of selection of major and therefore the two are independent
More Related