1 / 58

I. Introduction to Data and Statistics

I. Introduction to Data and Statistics. A. Basic terms and concepts Data set - variable - observation - data value. CentralGulf States. age. > 65. < 19. $. Rent $. 53. 19. 34. 98. 25. TX. 34. 14. 58. 89. 78. LA. MS. 35. 65. 78. 25. 56. 25. 78. 65. 12. 89. AL.

kitra
Download Presentation

I. Introduction to Data and Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I. Introduction to Data and Statistics A. Basic terms and concepts Data set - variable - observation - data value

  2. CentralGulf States age > 65 < 19 $ Rent $ 53 19 34 98 25 TX 34 14 58 89 78 LA MS 35 65 78 25 56 25 78 65 12 89 AL

  3. B. Primary and Secondary data 1. Primary data - original data - collected for a specific purpose - sample design and procedures - time and $

  4. 2. Secondary data - archival data - agency or organization - organized in a set format - time and $ - data quality an issue - sample design

  5. C. Individual and spatially aggregated data State 1 State 2 Region State 3 State 4 State 1 State 2 Region State 3 State 4

  6. D. Discreet and Continuous data 1. Discreet

  7. 2. Continuous

  8. E. Qualitative and Quantitative data 1. Qualitative (categorical) Ex: land cover, sex, political party, race 2. Quantitative Ex: population, precipitation, grades

  9. II. Scales of Measurement A. Nominal B. Ordinal C. Interval D. Ratio for comparison must use the same scale of measurement

  10. A. Nominal - Mutually exclusive - Exhaustive Ex: Name: George = 1, Wanda = 2, Bob = 3 Land Cover: Forested = 45, urban = 39, etc... Climate regimes: polar = 1, temperate = 2, tropical = 3 Sex: Male = 1, Female = 2

  11. B. Ordinal - ranked data - arbitrary - comparisons - not a set interval between rankings Ex: Places rated (cities, beaches…) Level of satisfaction (poor, ok, good)

  12. C. Interval - separated by absolute differences - does not have an absolute zero Ex: - temperature - elevation

  13. D. Ratio - separated by absolute differences - absolute zero Ex: - precipitation - tree growth - income

  14. III. Graphing procedures (univariate) A. frequency histogram B. cumulative histogram

  15. A. frequency histogram (+) (frequency polygon) Freq. (#, %) income, grades (-) 0 50 100

  16. B. Cumulative frequency histogram (cumulative frequency polygon) (+) Cumu- lative Freq. (#, %) (-) 0 50 100

  17. IV. Descriptive Statistics (univariate) - summary of data characteristics - inferential; extend sample to a larger population A. Measures of Central Tendency B. Measures of Dispersion C. Measures of Shape

  18. A. Measures of Central Tendency • attempt to define the most typical value of a larger data set • 1. Mode • 2. Median • 3. Mean (average)

  19. Mode (nominal only) • value that occurs most frequently • only measure of central tendency appropriate for nominal level data • works better for grouped data, not raw values • many data sets will not have two exact data sets

  20. 2. Median • the middle value from a set of ranked observations • equal number of observations on either side • appropriate when data is heavily skewed • interval or ratio level data, not nominal

  21. 3. Mean (average), .xi / n • most commonly used value of central tendency • interval or ratio level data • sensitive to outliers • most easily understood • assumptions: • unimodal • symmetric distribution

  22. mode mean median Normal distribution 0 50 100 (+) (-)

  23. mode median mean 0 50 100 (+) (-)

  24. B. Measures of Dispersion • provide information about distribution of data • 1. Range • 2. Standard deviation • 3. Coefficient of variation

  25. 1. Range • difference between largest and smallest value • simplest measure of dispersion • easy to calculate • can be misleading • ignores all other values • does not take into account clustering of data

  26. 2. Standard deviation • the average deviation of each value from the mean • based on the mean • better indicator of the dispersion of the entire sample (in comparison to the range) • scale dependent value

  27. 3. Coefficient of variation • standard deviation / mean • allows you to compare dispersion independent of scale • should be used to make comparisons where there are differences in mean

  28. Range: 85 - 15 = 70 Std. dev. ~ .xi - X C.V. = Std. dev. / mean X = 50 15 100 0 85 50 (+) (-)

  29. C.V. = Std. dev. / mean

  30. C. Measures of Shape 1. Skewness 2. Kurtosis

  31. Leptokurtic Mesokurtic Platykurtic

  32. Symmetrical (bell shaped) (+) skew (-) skew

  33. Mean Center

  34. 4 B (1.6, 3.8) G (4.9, 3.5) C (3.5, 3.3) 3 F (5.2, 2.4) 2 D (4.4, 2.0) 1 A (2.8, 1.5) E (4.3, 1.1) 0 1 2 3 4 5 6

  35. B (1.6, 3.8) 4 G (4.9, 3.5) C (3.5, 3.3) 3 Mean Center (3.81, 2.51) F (5.2, 2.4) 2 D (4.4, 2.0) 1 A (2.8, 1.5) E (4.3, 1.1) 0 1 2 3 4 5 6

  36. Weighted Mean Center

  37. G (3) B (20) 4 C (8) 3 F (5) 2 D (4) 1 E (6) A (5) 0 1 2 3 4 5 6

  38. G (3) B (20) 4 C (8) 3 F (5) Weighted Mean Center (3.10, 2.88) 2 D (4) 1 E (6) A (5) 0 1 2 3 4 5 6

  39. Correlation - Bivariate relationship Scattergrams 1. Direction negative or positive 2. Strength of relationship perfect, strong, weak, no

  40. Positive (direct) correlation (+) (-) (+)

  41. Negative (inverse) correlation (+) (-) (+)

  42. Perfect correlation (+) (-) (+)

  43. Strong correlation (+) (-) (+)

  44. Weak correlation (+) (-) (+)

  45. No correlation ?? (+) (-) (+)

  46. Controlled Correlation (+) (-) (+)

  47. Controlled correlation (clumping) (+) (-) (+)

  48. (+) (-) (+)

More Related