1 / 79

Basic Concepts in Statistics The Background Preparation for Data Analysis

Basic Concepts in Statistics The Background Preparation for Data Analysis. Statistical Analysis. Inferential Statistics -testing hypothesis. Describing Data. Computing Descriptive Statistics. Visualizing Data -looking at distribution. Identify values that appear be unusual.

philk
Download Presentation

Basic Concepts in Statistics The Background Preparation for Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Concepts in StatisticsThe Background Preparation for Data Analysis by Teoh Sian Hoon

  2. Statistical Analysis Inferential Statistics -testing hypothesis Describing Data Computing Descriptive Statistics Visualizing Data -looking at distribution • Identify values that appear be unusual • Check the original records to make sure that these values are not the results of errors in coding • Detect outliers/ non-normality ..(many of the statistical procedures require that the distribution be more or less symmetric) • to determine whether the statistical techniques that we are considering for data analysis are appropriate. by Teoh Sian Hoon

  3. 1. Basic Concept • - statistical terms Parameter population mean, Statistic sample mean, by Teoh Sian Hoon

  4. Jam -- RM1950 Jack – RM2000 Man – RM 2200 Su -- RM2500 San – RM2800 Tan – RM3000 Jul -- RM 3100 Mad – RM3700 Ina – RM3900 Shan – RM4000 • 1. Basic Concept (continued) • - mean Income Mean = RM2,915 Median= RM2,900 by Teoh Sian Hoon

  5. Jam -- RM1950 Jack – RM2000 Man – RM 2200 Su -- RM2500 San – RM2800 Tan – RM3000 Jul -- RM 3100 Mad – RM3700 Ina – RM3900 • 1. Basic Concept (continued) • - descriptive statistics Mean = ?????? Bill Gates – ?????? by Teoh Sian Hoon

  6. Jam -- RM1950 Jack – RM2000 Man – RM 2200 Su -- RM2500 San – RM2800 Tan – RM3000 Jul -- RM 3100 Mad – RM3700 Ina – RM3900 • 1. Basic Concept (continued) • - descriptive statistics How widely the values in the dataset are spread apart? Mean =????? Median = ???? if Bill Gates – RM500, 000 by Teoh Sian Hoon

  7. Jam -- RM1950 Jack – RM2000 Man – RM 2200 Su -- RM2500 San – RM2800 Tan – RM3000 Jul -- RM 3100 Mad – RM3700 Ina – RM3900 • 1. Basic Concept (continued) • - descriptive statistics But those other nine people didn't become millionaires just because Bill Gates was included. Mean =RM52,515 Median = RM2,900 if Bill Gates – RM500, 000 by Teoh Sian Hoon

  8. 1. Basic Concept (continued) • - descriptive statistics standard deviation by Teoh Sian Hoon

  9. 1. Basic Concept (continued) • - descriptive statistics standard deviation A measure of dispersion around the mean. by Teoh Sian Hoon

  10. 1. Basic Concept (continued) • - Normal Distribution 68.3% 95.4% In a normal distribution, 68.3% of cases fall within one SD of the mean and 95.4% of cases fall within 2 SD. For example, if the mean age is 45, with a standard deviation of 10, 95.4% of the cases would be between 25 and 65 in a normal distribution. by Teoh Sian Hoon

  11. 1. Basic Concept (continued) • - Normal Distribution mean age is 45 standard deviation of 10 68.3% 95.4% by Teoh Sian Hoon

  12. skewness value of zero. f variable Mean = Median = Mode • 1. Basic Concept (continued) • - skewness A normal distribution is symmetric by Teoh Sian Hoon

  13. f variable Mode Median Mean f Mean Median Mode variable • 1. Basic Concept (continued) • - skewness • Skewed to the left or negatively skewed: • A distribution with a significant negative skewness has a long left tail. • The value of the mean is the smallest and the mode is the largest, with the value of the median lying between these two values. • Skewed to the right or positively skewed: • A distribution with a significant positive skewness has a long right tail. • The value of the mean is the largest , the mode is the smallest and the median lies between these two values by Teoh Sian Hoon

  14. 1. Basic Concept (continued) • - skewness How skewed a distribution can be before it is considered a problem? |-0.067| < 2 (0.687) As a rough guide, a skewness value more than twice it's standard error is taken to indicate a departure from symmetry. by Teoh Sian Hoon

  15. 1. Basic Concept (continued) • - kurtosis Kurtosis = 0  mesokurtic Kurtosis > 0  leptokurtic Kurtosis < 0  platykurtic In general, kurtosis is not very important for an understanding of statistics by Teoh Sian Hoon

  16. 1. Basic Concept (continued) • - Central Limit Theorem for n large then If by Teoh Sian Hoon

  17. 1. Basic Concept (continued) • - SPSS Data Engine weight 307 350 318 304 302 429 454 440 455 390 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 by Teoh Sian Hoon

  18. 1. Basic Concept (continued) • - SPSS Steps: • Analyze • Descriptive Statistics • Explore • In Dependent List select ‘weight’ • In Statistics select Descriptive and Outliers • In Plot select Histogram and Normality lot with Tests by Teoh Sian Hoon

  19. 1. Basic Concept (continued) • - SPSS The 5% trimmed mean excludes the 5% largest and 5% smallest values. |-0.067| < 2 (0.687) The trimmed mean provides an alternative to the median when there are some data values that are far removed from the rest. by Teoh Sian Hoon

  20. 1. Basic Concept (continued) • - SPSS Significance levels are reasonably large, indicating that normality is not an unreasonable assumption. by Teoh Sian Hoon

  21. 1. Basic Concept (continued) • - Example of study Refer to Appendix A : GRAIN-SIZE ANALYSIS (Geology) http://darkwing.uoregon.edu/~dogsci/dorsey/geo334/Lab5.pdf#search='using%20skewness' by Teoh Sian Hoon

  22. 1. Basic Concept (continued) • - : GRAIN-SIZE ANALYSIS (Geology) Grain-size distribution of sediments is important for characterizing substrate behavior in engineering and hazards applications, and is commonly analyzed as part of soil and sedimentation surveys in Quaternary and older sediments. It is especially important in studies of dam effects in regulated rivers, because the size distribution of the sediment determines to a large extent whether it is transported and where it will be stored under the regulated flow regime. Attempts to use plots of statistical parameters to identify sediments of different depositional environments sparked great interest in the early 1960’s. Parameters that are commonly used include the first four statistical moments: mean, standard deviation, skewness, and kurtosis, as described by Boggs (p. 64-71). Friedman (1961) attempted to distinguish between beach and river sands using skewness and standard deviation. by Teoh Sian Hoon

  23. 2. Graphs Data Engine weight 307 350 318 304 302 429 454 440 455 390 3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 by Teoh Sian Hoon

  24. 2. Graphs - Histogram Steps: • Graphs • Histogram • Enter “vehicle weight” • Mark “Display normal curve” by Teoh Sian Hoon

  25. 2. Graphs - Histogram by Teoh Sian Hoon

  26. whether the distribution symmetric. • Look for separate clumps of data values. 2. Graphs - Histogram by Teoh Sian Hoon

  27. 2. Graphs - Boxplots • From the menus, choose: • Graphs   Boxplot • In the Boxplot initial dialog box, select the icon for simple. • Select an option under Data in Chart Are. • Select Define. • Select variables and options for the chart. by Teoh Sian Hoon

  28. 2. Graphs - Boxplots Extreme outlier * Mild outlier Largest observed value that is not an outlier 3rd Quartile (Q3) Median (Q2) 1st Quartile (Q1) Smallest observed value that is not an outlier by Teoh Sian Hoon

  29. 2. Graphs - Boxplots by Teoh Sian Hoon

  30. 2. Graphs - P-P Plot Steps: • Graphs • P-P Plots • Enter “vehicle weight” • Test distribution “Normal” by Teoh Sian Hoon

  31. 2. Graphs 2. Graphs - P-P Plot by Teoh Sian Hoon

  32. 2. Graphs 2. Graphs - P-P Plot by Teoh Sian Hoon

  33. 2. Graphs 2. Graphs - P-P Plot by Teoh Sian Hoon

  34. 3. Inferential Statistics • Testing Hypotheses by Teoh Sian Hoon

  35. To describe a population Objectives To determine a significant difference (s) Comparing a sample (s) Research Questions Types of Analysis To analyze the significance Of relationship between 2 variables Descriptive Independent Variables Inferential Variables Dependent Variables by Teoh Sian Hoon

  36. t - test • In one group • Between 2 groups ANOVA • > 2 groups To describe a population To determine a significant difference (s) comparing a sample (s) Objectives To analyze the significance of relationship between 2 variables by Teoh Sian Hoon

  37. Z test and Chi Square by Teoh Sian Hoon

  38. rank count Describing a Population Level of measurement for the dependent variable Interval/ ratio Ordinal Nominal mean Variance proportion median Z-test Chi-square test by Teoh Sian Hoon

  39. rank count Describing a Population Level of measurement for the dependent variable Interval/ ratio Ordinal Nominal mean Variance proportion median The program does not have an option for a one-proportion z-test. However, the Chi-Square goodness of fit test can be used to produce an equivalent result Z-test Chi-square test by Teoh Sian Hoon

  40. Example: Z test / Chi Square To test the proportion of female engineer that attended the event is different than the proportion of male engineer. by Teoh Sian Hoon

  41. Hypothesis: Ho: p = .5 H1 : p ¹ .5 by Teoh Sian Hoon

  42. Steps: Data weight cases  weight cases by freq Analyze  nonparametric tests  chi-square Test variable list gender Expected values all categories equal by Teoh Sian Hoon

  43. Conclusion Since p-value= 0.527>0.05, do not reject H0 . The proportion of female engineer that attended the event is equal to the proportion of male engineer. by Teoh Sian Hoon

  44. Statistical Terms In many areas of research, the p-value of .05 is customarily treated as a "border-line acceptable" error level. by Teoh Sian Hoon

  45. Example: Chi Square • The marketing manager for an automobile manufacturer is interested in determining the proportion of new compact-car owners who would have purchased a passenger-side inflatable air bag if it had been available for an additional cost of RM300. The manager believes from previous information that the proportion is .30.   Suppose that a survey of 200 new compact-car owners is selected and 79 indicate that they would have purchased the air bags.   by Teoh Sian Hoon

  46. Example: Chi Square (continued) • Since this is a hypothesis test for the proportion, it will be a Z-test.     • At the .10 level of significance, is there enough evidence that the population proportion is different from .30?   by Teoh Sian Hoon

  47. Hypothesis by Teoh Sian Hoon

  48. Conclusion Since p-value= 0.003 < 0.10, we reject H0 . Therefore, the population proportion is different from 0.30.   by Teoh Sian Hoon

  49. Example: Chi-Square Level of education attained by the women from a rural region is divided into three categories: can read/write degree; primary degree; secondary and above degree. A demographer estimates that 28% of them have can read/write degree, 61% have primary degree and 11% have higher secondary degree. In order to verify these percentages, a random sample of n = 100 women at the region were selected and their level of education recorded. The number of the women whose level of education falling into each of the three categories is shown in the following table. by Teoh Sian Hoon

  50. Example: Chi-Square (continued) by Teoh Sian Hoon

More Related