I. Introduction to Data and Statistics

I. Introduction to Data and Statistics A. Basic terms and concepts Data set - variable - observation - data value

CentralGulf States age > 65 < 19 $ Rent $ 53 19 34 98 25 TX 34 14 58 89 78 LA MS 35 65 78 25 56 25 78 65 12 89 AL

B. Primary and Secondary data 1. Primary data - original data - collected for a specific purpose - sample design and procedures - time and $

2. Secondary data - archival data - agency or organization - organized in a set format - time and $ - data quality an issue - sample design

C. Individual and spatially aggregated data State 1 State 2 Region State 3 State 4 State 1 State 2 Region State 3 State 4

D. Discreet and Continuous data 1. Discreet

2. Continuous

E. Qualitative and Quantitative data 1. Qualitative (categorical) Ex: land cover, sex, political party, race 2. Quantitative Ex: population, precipitation, grades

II. Scales of Measurement A. Nominal B. Ordinal C. Interval D. Ratio for comparison must use the same scale of measurement

A. Nominal - Mutually exclusive - Exhaustive Ex: Name: George = 1, Wanda = 2, Bob = 3 Land Cover: Forested = 45, urban = 39, etc... Climate regimes: polar = 1, temperate = 2, tropical = 3 Sex: Male = 1, Female = 2

B. Ordinal - ranked data - arbitrary - comparisons - not a set interval between rankings Ex: Places rated (cities, beaches…) Level of satisfaction (poor, ok, good)

C. Interval - separated by absolute differences - does not have an absolute zero Ex: - temperature - elevation

D. Ratio - separated by absolute differences - absolute zero Ex: - precipitation - tree growth - income

III. Graphing procedures (univariate) A. frequency histogram B. cumulative histogram

A. frequency histogram (+) (frequency polygon) Freq. (#, %) income, grades (-) 0 50 100

B. Cumulative frequency histogram (cumulative frequency polygon) (+) Cumu- lative Freq. (#, %) (-) 0 50 100

IV. Descriptive Statistics (univariate) - summary of data characteristics - inferential; extend sample to a larger population A. Measures of Central Tendency B. Measures of Dispersion C. Measures of Shape

A. Measures of Central Tendency • attempt to define the most typical value of a larger data set • 1. Mode • 2. Median • 3. Mean (average)

Mode (nominal only) • value that occurs most frequently • only measure of central tendency appropriate for nominal level data • works better for grouped data, not raw values • many data sets will not have two exact data sets

2. Median • the middle value from a set of ranked observations • equal number of observations on either side • appropriate when data is heavily skewed • interval or ratio level data, not nominal

3. Mean (average), .xi / n • most commonly used value of central tendency • interval or ratio level data • sensitive to outliers • most easily understood • assumptions: • unimodal • symmetric distribution

mode mean median Normal distribution 0 50 100 (+) (-)

mode median mean 0 50 100 (+) (-)

B. Measures of Dispersion • provide information about distribution of data • 1. Range • 2. Standard deviation • 3. Coefficient of variation

1. Range • difference between largest and smallest value • simplest measure of dispersion • easy to calculate • can be misleading • ignores all other values • does not take into account clustering of data

2. Standard deviation • the average deviation of each value from the mean • based on the mean • better indicator of the dispersion of the entire sample (in comparison to the range) • scale dependent value

3. Coefficient of variation • standard deviation / mean • allows you to compare dispersion independent of scale • should be used to make comparisons where there are differences in mean

Range: 85 - 15 = 70 Std. dev. ~ .xi - X C.V. = Std. dev. / mean X = 50 15 100 0 85 50 (+) (-)

C.V. = Std. dev. / mean

C. Measures of Shape 1. Skewness 2. Kurtosis

Leptokurtic Mesokurtic Platykurtic

Symmetrical (bell shaped) (+) skew (-) skew

Mean Center

4 B (1.6, 3.8) G (4.9, 3.5) C (3.5, 3.3) 3 F (5.2, 2.4) 2 D (4.4, 2.0) 1 A (2.8, 1.5) E (4.3, 1.1) 0 1 2 3 4 5 6

B (1.6, 3.8) 4 G (4.9, 3.5) C (3.5, 3.3) 3 Mean Center (3.81, 2.51) F (5.2, 2.4) 2 D (4.4, 2.0) 1 A (2.8, 1.5) E (4.3, 1.1) 0 1 2 3 4 5 6

Weighted Mean Center

G (3) B (20) 4 C (8) 3 F (5) 2 D (4) 1 E (6) A (5) 0 1 2 3 4 5 6

G (3) B (20) 4 C (8) 3 F (5) Weighted Mean Center (3.10, 2.88) 2 D (4) 1 E (6) A (5) 0 1 2 3 4 5 6

Correlation - Bivariate relationship Scattergrams 1. Direction negative or positive 2. Strength of relationship perfect, strong, weak, no

Positive (direct) correlation (+) (-) (+)

Negative (inverse) correlation (+) (-) (+)

Perfect correlation (+) (-) (+)

Strong correlation (+) (-) (+)

Weak correlation (+) (-) (+)

No correlation ?? (+) (-) (+)

Controlled Correlation (+) (-) (+)

Controlled correlation (clumping) (+) (-) (+)

(+) (-) (+)

I. Introduction to Data and Statistics