1 / 90

Basic concept of statistics

Basic concept of statistics. Measures of central tendency. Measures of dispersion & variability. Measures of tendency central. Arithmetic mean (= simple average). Best estimate of population mean is the sample mean, X. measurement in population. summation. sample size.

madeline
Download Presentation

Basic concept of statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic concept of statistics • Measures of central tendency • Measures of dispersion & variability

  2. Measures of tendency central Arithmetic mean (= simple average) • Best estimate of population mean is the sample mean, X measurement in population summation sample size index of measurement

  3. Measures of variability All describe how “spread out” the data • Sum of squares,sum of squared deviations from the mean • For a sample,

  4. Why? • Average or mean sum of squares = variance, s2: • For a sample,

  5. n – 1 represents the degrees of freedom, , or number of independent quantities in the estimate s2. Greek letter “nu” • therefore, once n – 1 of all deviations are specified, the last deviation is already determined.

  6. Standard deviation, s • Variance has squared measurement units – to regain original units, take the square root • For a sample,

  7. Standard error of the mean • Standard error of the mean is a measure of variability among the means of repeated samples from a population. • For a sample,

  8. Body Weight Data (Kg) N = 28μ = 44σ² = 1.214 A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46

  9. Body Weight Data (Kg) repeated random sampling, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 43

  10. Body Weight Data (Kg) repeated random sampling, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 43 44

  11. Body Weight Data (Kg) repeated random sampling, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 43 44 45

  12. Body Weight Data (Kg) repeated random sampling, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 43 44 45 44

  13. Body Weight Data (Kg) repeated random sampling, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 43 44 45 44 44

  14. Body Weight Data (Kg) repeated random sampling, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46

  15. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 46

  16. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 46 44

  17. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 46 44 46

  18. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 46 44 46 45

  19. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 46 44 46 45 44

  20. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46

  21. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 42

  22. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 42 42

  23. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 42 42 43

  24. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 42 42 43 45

  25. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46 42 42 43 45 43

  26. Body Weight Data (Kg) Repeated random samples, each with sample size, n = 5 values … A Population of Values 44 45 43 44 44 43 42 46 44 44 44 46 43 44 44 43 42 44 43 44 43 46 44 43 44 45 45 46

  27. For a large enough number of large samples, the frequency distribution of the sample means (= sampling distribution), approaches a normal distribution.

  28. Normal distribution: bell-shaped curve

  29. Testing statistical hypotheses between 2 means • State the research question in terms of statistical hypotheses. It is always started with a statement that hypothesizes “no difference”, called the null hypothesis = H0. • E.g., H0: Mean bill length of female hummingbirds is equal to mean bill length of male hummingbirds

  30. Then we formulate a statement that must be true if the null hypothesis is false, called the alternate hypothesis = HA . • E.g., HA: Mean bill length of female hummingbirds is not equal to mean bill length of male hummingbirds If we reject H0 as a result of sample evidence, then we conclude that HA is true.

  31. William Sealey Gosset (a.k.a. “Student”) • Choose an appropriate statistical test that would allow you to reject H0 if H0 were false. E.g., Student’s t test for hypotheses about means

  32. Mean of sample 1 Mean of sample 2 Standard error of the difference between the sample means To estimate s(X1 - X2), we must first know the relation between both populations. t Statistic,

  33. How to evaluate the success of this experimental design class • Compare the score of statistics and experimental design of several student • Compare the score of experimental design of several student from two serial classes • Compare the score of experimental design of several student from two different classes

  34. Comparing the score of Statistics and experimental experimental design of several student Similar Student Dependent populations Identical Variance Not Identical Variance Different Student Independent populations Identical Variance

  35. Comparing the score of experimental design of several student from two serial classes Not Identical Variance Independent populations Different Student Identical Variance

  36. Comparing the score of experimental design of several student from two classes Not Identical Variance Different Student Independent populations Identical Variance

  37. Relation between populations • Dependent populations • Independent populations • Identical (homogenous ) variance • Not identical (heterogeneous) variance

  38. Dependent Populations Sample Null hypothesis: The mean difference is equal too Null distribution t with n-1 df *n is the number of pairs Test statistic compare How unusual is this test statistic? P > 0.05 P < 0.05 Reject Ho Fail to reject Ho

  39. Independent Population with homogenous variances Pooled variance: Then,

  40. Independent Population with homogenous variances

  41. When sample sizes are small, the sampling distribution is described better by the t distribution than by the standard normal (Z) distribution. Shape of t distribution depends on degrees of freedom,  = n – 1.

  42. Z = t(=) t(=25) t(=5) t(=1) t

  43. For  = 0.05 0.025 0.95 0.025 The distribution of a test statistic is divided into an area of acceptance and an area of rejection. Area of Acceptance Area of Rejection Area of Rejection 0 Lower critical value Upper critical value t

  44. Critical tfor a test about equality = t(2),

  45. Independent Population with heterogenous variances

  46. Analysis of Variance (ANOVA)

  47. Independent T-test • Compares the means of one variable for TWO groups of cases. • Statistical formula: Meaning: compare ‘standardized’ mean difference • But this is limited to two groups. What if groups > 2? • Pair wised T Test (previous example) • ANOVA (ANalysis Of Variance)

  48. From T Test to ANOVA 1. Pairwise T-Test If you compare three or more groups using t-tests with the usual 0.05 level of significance, you would have to compare each pairs (A to B, A to C, B to C), so the chance of getting the wrong result would be: 1 - (0.95 x 0.95 x 0.95)   =   14.3% Multiple T-Tests will increase the false alarm.

  49. From T Test to ANOVA 2. Analysis Of Variance • In T-Test, mean difference is used. Similar, in ANOVA test comparing the observed variance among means is used. • The logic behind ANOVA: • If groups are from the same population, variance among means will be small (Note that the means from the groups are not exactly the same.) • If groups are from different population, variance among means will be large.

  50. What is ANOVA? • ANOVA (Analysis of Variance) is a procedure designed to determine if the manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable. • Assumption • Each independent variable is categorical (nominal scale). Independent variables are called Factors and their values are called levels. • The dependent variable is numerical (ratio scale) • The basic idea is that the “variance” of the dependent variable given the influence of one or more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is significantly greater than the “variance” of the dependent variable (assuming no influence of the independent variables) {also known as the Mean-Square-Error (MSE)}.

More Related