GEOGRAPHICAL STATISTICS GE 2110

GEOGRAPHICAL STATISTICSGE 2110 Zakaria A. Khamis

INFERENTIAL STATISTICS • Consider that you interview 50 people in the Zanzibar town, and ask them ‘how far they commute to work?’ Then you compute the MEAN from the data collected • The sample mean provides us with both a simple summary measure and our best estimate for what the true average commuting distance is for the entire town • What will happen if we interview another sample of other 50 people? • If we interview other 50 people, we would likely come up with different estimate

ESTIMATES • Samples are used to obtain estimates of numerical characteristics of populations, where it is impractical to measure or count the whole population • It is conventional to use the term ‘population parameter’ to denote a numerical characteristic of a population and ‘sample statistic’ to denote a numerical characteristic of a sample • A sample statistic generally provides the only estimate available of the corresponding population parameter • The most commonly used parameters and statistics are the mean and standard deviation

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • It is unlikely that any one sample, no matter how unbiased, will yield values of sample mean and standard deviation which will give precise estimates of the corresponding population parameters of mean and standard deviation • Indeed, if one took a large number of samples from the same population, it is probable that each would produce a different mean and standard deviation • A distribution of such sample statistics – means of samples – is known as ‘a sampling distribution’

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • If the samples are all unbiased, it is reasonable to expect that their means will be distributed symmetrically about the corresponding population mean •  Some sample means will be bigger than the population mean, and a roughly equal number will be smaller • The situation is the same for other sample statistics like standard deviation • Now in practice we do not take a number of samples but only one, and we don’t know the population parameters but can only estimate them from one sample

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • Nevertheless, by hypothesizing sampling distribution based upon a large number of number of imaginary samples, mathematicians have been able to provide us with a measure of the accuracy of parameter estimates based on single real sample • Central Limit theorem states that if we imagine taking all possible samples of similar size from a single population, the sampling distribution of their means will be approximately normally distributed about the population mean, whatever the character of the population distribution, provided that the samples are fairly large • The standard deviation of a sampling distribution is called the standard error of the sample statistics

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • The standard deviation of the sampling distribution of means is given by • Where σ is the standard deviation of the population and n is the sample size • The standard deviation of a sampling distribution is called Standard Error (S.E)of the sample statistics. If the statistics is the mean, this will be referred to as the S.E of mean

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • Standard Error is the key to accurate estimation of population parameters from sample statistics • Suppose that we wish to estimate the average income of people in the Urban-West region • We take a random sample of 100 and find that and • The value of gives us a rough estimate of μ, but the questions are:-

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • HOW ROUGH? • WHAT IS THE LIKELIHOOD OF THE TRUE VALUE OF μ BEING AS HIGH AS 60 OR AS LOW AS 40?

SAMPLING DISTRIBUTIONS AND STANDARD ERROR • Back to normal distribution curve  the shape of the normal curve arising from the mathematical equation which relate to its standard deviation (standard error). • It is found that approximately • 68% of the area under a normal curve lies within 1 standard deviation (standard error) of the mean • 95% of the area lies within 2 standard deviation (standard error) of the mean • 99.7% of the area lies within 3 standard deviation (standard error) of the mean

GEOGRAPHICAL STATISTICS GE 2110