590 likes | 673 Views
I. Introduction to Data and Statistics. A. Basic terms and concepts Data set - variable - observation - data value. CentralGulf States. age. > 65. < 19. $. Rent $. 53. 19. 34. 98. 25. TX. 34. 14. 58. 89. 78. LA. MS. 35. 65. 78. 25. 56. 25. 78. 65. 12. 89. AL.
E N D
I. Introduction to Data and Statistics A. Basic terms and concepts Data set - variable - observation - data value
CentralGulf States age > 65 < 19 $ Rent $ 53 19 34 98 25 TX 34 14 58 89 78 LA MS 35 65 78 25 56 25 78 65 12 89 AL
B. Primary and Secondary data 1. Primary data - original data - collected for a specific purpose - sample design and procedures - time and $
2. Secondary data - archival data - agency or organization - organized in a set format - time and $ - data quality an issue - sample design
C. Individual and spatially aggregated data State 1 State 2 Region State 3 State 4 State 1 State 2 Region State 3 State 4
D. Discreet and Continuous data 1. Discreet
E. Qualitative and Quantitative data 1. Qualitative (categorical) Ex: land cover, sex, political party, race 2. Quantitative Ex: population, precipitation, grades
II. Scales of Measurement A. Nominal B. Ordinal C. Interval D. Ratio for comparison must use the same scale of measurement
A. Nominal - Mutually exclusive - Exhaustive Ex: Name: George = 1, Wanda = 2, Bob = 3 Land Cover: Forested = 45, urban = 39, etc... Climate regimes: polar = 1, temperate = 2, tropical = 3 Sex: Male = 1, Female = 2
B. Ordinal - ranked data - arbitrary - comparisons - not a set interval between rankings Ex: Places rated (cities, beaches…) Level of satisfaction (poor, ok, good)
C. Interval - separated by absolute differences - does not have an absolute zero Ex: - temperature - elevation
D. Ratio - separated by absolute differences - absolute zero Ex: - precipitation - tree growth - income
III. Graphing procedures (univariate) A. frequency histogram B. cumulative histogram
A. frequency histogram (+) (frequency polygon) Freq. (#, %) income, grades (-) 0 50 100
B. Cumulative frequency histogram (cumulative frequency polygon) (+) Cumu- lative Freq. (#, %) (-) 0 50 100
IV. Descriptive Statistics (univariate) - summary of data characteristics - inferential; extend sample to a larger population A. Measures of Central Tendency B. Measures of Dispersion C. Measures of Shape
A. Measures of Central Tendency • attempt to define the most typical value of a larger data set • 1. Mode • 2. Median • 3. Mean (average)
Mode (nominal only) • value that occurs most frequently • only measure of central tendency appropriate for nominal level data • works better for grouped data, not raw values • many data sets will not have two exact data sets
2. Median • the middle value from a set of ranked observations • equal number of observations on either side • appropriate when data is heavily skewed • interval or ratio level data, not nominal
3. Mean (average), .xi / n • most commonly used value of central tendency • interval or ratio level data • sensitive to outliers • most easily understood • assumptions: • unimodal • symmetric distribution
mode mean median Normal distribution 0 50 100 (+) (-)
mode median mean 0 50 100 (+) (-)
B. Measures of Dispersion • provide information about distribution of data • 1. Range • 2. Standard deviation • 3. Coefficient of variation
1. Range • difference between largest and smallest value • simplest measure of dispersion • easy to calculate • can be misleading • ignores all other values • does not take into account clustering of data
2. Standard deviation • the average deviation of each value from the mean • based on the mean • better indicator of the dispersion of the entire sample (in comparison to the range) • scale dependent value
3. Coefficient of variation • standard deviation / mean • allows you to compare dispersion independent of scale • should be used to make comparisons where there are differences in mean
Range: 85 - 15 = 70 Std. dev. ~ .xi - X C.V. = Std. dev. / mean X = 50 15 100 0 85 50 (+) (-)
C. Measures of Shape 1. Skewness 2. Kurtosis
Leptokurtic Mesokurtic Platykurtic
Symmetrical (bell shaped) (+) skew (-) skew
4 B (1.6, 3.8) G (4.9, 3.5) C (3.5, 3.3) 3 F (5.2, 2.4) 2 D (4.4, 2.0) 1 A (2.8, 1.5) E (4.3, 1.1) 0 1 2 3 4 5 6
B (1.6, 3.8) 4 G (4.9, 3.5) C (3.5, 3.3) 3 Mean Center (3.81, 2.51) F (5.2, 2.4) 2 D (4.4, 2.0) 1 A (2.8, 1.5) E (4.3, 1.1) 0 1 2 3 4 5 6
G (3) B (20) 4 C (8) 3 F (5) 2 D (4) 1 E (6) A (5) 0 1 2 3 4 5 6
G (3) B (20) 4 C (8) 3 F (5) Weighted Mean Center (3.10, 2.88) 2 D (4) 1 E (6) A (5) 0 1 2 3 4 5 6
Correlation - Bivariate relationship Scattergrams 1. Direction negative or positive 2. Strength of relationship perfect, strong, weak, no
Positive (direct) correlation (+) (-) (+)
Negative (inverse) correlation (+) (-) (+)
Perfect correlation (+) (-) (+)
Strong correlation (+) (-) (+)
Weak correlation (+) (-) (+)
No correlation ?? (+) (-) (+)
Controlled Correlation (+) (-) (+)
Controlled correlation (clumping) (+) (-) (+)
(+) (-) (+)