Chapter 9 Statistical Data Analysis

Chapter 9Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Data Analysis • Data Analysis • Helps us achieve the four scientific goals of description, prediction, explanation, and control • Statisical Data Analysis • Three primary reasons geographers treat data in a statisitical fashion http://rlv.zcache.com/knowledge_is_power_do_statistics_stats_humor_flyer-p2440846222778564182dwj5_400.jpg

Statistical Description • Descriptive Statistics • Parameters • Central Tendency • Mode • Median • Mean • Arithmetic mean • When would you use the median or the mode instead of the mean?

Descriptive Statistics • Variability • Range • = largest value – smallest value • Variance • Standard Deviation

Descriptive Statistics • Form • Modality • Skewness • Positive • Negative • Symmetry • Unimodal – Bell-shaped • Normal Distribution http://people.eku.edu/falkenbergs/images/skewness.jpg

Descriptive Statistics • Derived Scores • Percentile Rank • Highest – 99th percentile • Where is the median? • Z-score • Standard deviation units above or below the mean

Descriptive Statistics • Relationship • Linear Relationship • Positive • Negative • Relationship Strength • Weak, strong, no relationship • Correlation Coefficient • Between -1 and 1 • 0 – no relationship • Regression Analysis • Criterion variables (Y) • Predictor variables (X) http://hosting.soonet.ca/eliris/remotesensing/LectureImages/correlation.gif

Correlation – Causation? http://xkcd.com/552/ “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.” - XKCD

Statistical Inference • Inferential Statistics • Statistics • Sampling error • Given our sample statistics, we infer our parameters • Assign probabilities to our guesses • Power and difficulty of inferential statistics comes from deriving probabilities about how likely it is that sample patterns reflect population patterns

Inferential Statistics • Sampling distribution • Ex: sampling distribution of means – show the probability that a single sample would have a mean within some given RANGE of values • Central limit theorem – sampling distribution of sample means will be normal with a mean equal to the population mean and a standard deviation equal to the population standard deviation divided by the square root of the sample size

Inferential Statistics • Generation of sampling distributions • Assumptions • Distributional assumptions • Nonparametric • Parametric • Normality • Homogeneity of variance • Independence of scores • Correct specification of models

Estimation and Hypothesis Testing • Estimation • Point estimation • Confidence Interval • Usually 95% • Hypothesis Testing • Null hypothesis • A hypothesis about the exact (point) value of a parameter or set of parameters • Use sample statistics to make an inference about the probable truth of our null hypothesis

Hypothesis Testing • Alternative Hypothesis • Hypothesis that the parameter does not equal the exact value hypothesized in the null • A range rather than an exact value • Modus Tollens • Useful for disconfirming • Not confirming!

Example • From a recent nationwide study it is known that the typical American watches 25 hours of television per week, with a population standard deviation of 5.6 hours. Suppose 50 Denver residents are randomly sampled with an average viewing time of 22 hours per week and a standard deviation of 4.8. Are Denver television viewing habits different from nationwide viewing habits? • Step 1: State your null and alternative hypotheses • What is this saying?

Example • Step 2: Determine your appropriate test statistic and its sampling distribution assuming the null is true • We are testing a sample mean where n>30 and so a z distribution can be used • Step 3: Calculate the test statistic from your sample data • Step 4: Compare the empirically obtained test statistic to the null sampling distribution • P value: • OR Critical value at .05 significance level: z = ±1.96 • Decision: Reject the null hypothesis • -3.79 is less than -1.96: reject • The p value is very small, less than .05 and even .01: reject

Error • You have made either a correct inference or a mistake • Type I error is the rejection level, p (or α) • Type II error - β http://www.mirrorservice.org/sites/home.ubalt.edu/ntsbarsh/Business-stat/error.gif

Data in Space and Place • Spatiality is a focus in geography, unlike other disciplines • Spatial autocorrelation • First Law of Geography: Everything is related to everything else, but near things are more related than distant things • Positive v negative spatial autocorrelation • A violation of the important statistical assumption of independence • Ex: If its raining in my backyard, I can say with a high degree of confidence its raining in my neighbor’s backyard, but my level of confidence that it is raining across town is lower, and 300 miles away even lower • Variogram http://www.innovativegis.com/basis/Papers/Other/ASPRSchapter/Default_files/image023.png

Data in Space and Place • “Spatial data are special” – a special difficulty • Which areal units should be used to analyze geographic data • Modifiable Areal Unit Problem • Gerrymandering • Geographic phenomena are often scale dependent • Must identify the scale of a phenomena and collect and organize data in units of that size • Data aggregation issues

Discussion Questions • What measure of central tendency is best for nominal data? • When pollsters tell you that a candidate is favored by 44% of likely voters, plus or minus 3 percent, what is the 44% and what is the plus/minus 3%? • A survey of all users of a park in 1980 found the average number of people per party to be 3.5. In a random sample of 35 parties in 2000 the average was 2.9. If you wanted to test if the number of persons per party in 2000 was different from the number in 1980, what would your null and alternative hypotheses be? • In the United States, we presume that someone is innocent. If a guilty person were found to be not guilty, what type of error would this be? • A researcher finds that a particular learning software has an effect on student’s test scores, when actually it does not. What type of error is this?

Chapter 9 Statistical Data Analysis

Chapter 9 Statistical Data Analysis

Presentation Transcript

Multivariate Data Analysis Chapter 9 - Cluster Analysis

Statistical Data Analysis

Statistical Data Analysis STAT221A

Statistical Data Analysis

Statistical Data Analysis

Chapter 7: Statistical Analysis

Statistical Data Analysis

CHAPTER 6 Statistical Analysis of Experimental Data

Statistical Analysis of Microarray Data

Chapter 4 Statistical Data Analysis

Data Processing/Statistical Analysis

Statistical Analysis of Data

Chapter 17: Statistical Analysis

Chapter 9 - Data Analysis

STATISTICAL DATA ANALYSIS

Statistical Analysis of Decay Data

Chapter 9 Data Analysis CS267

Qualitative data Statistical Analysis

Chapter 7: Statistical Analysis Data Treatment and Evaluation

Multivariate Data/Statistical Analysis

Statistical Analysis – Chapter 9 Regression-Correlation Pt. II

Statistical Data Analysis