Statistical concepts

Statistical concepts What students (and teachers) don’t know

Let’s start with measurement: • What do we mean by “volume” in measurement?

Start with measurement: • What do we mean by “volume” in measurement? • How do we measure volume?

We should understand both the concept and how it is measured. Volume is the space occupied by a 3D object. Volume can be measured by displacement or by modelling the object as a geometric solid, egicecream cone as a cone and hemisphere, etc.

Centre of a distribution • What do we mean by “centre”? Score playing first game of SKUNK

Centre of a distribution • The centre is the one best number to describe the position of the whole group. Score playing first game of SKUNK

Centre of a distribution • Mean score 50.1, median 55 • Which is better? Mean or median? Score playing first game of SKUNK

Centre of a distribution • The mean is a more efficient measure than the median. • The sample mean tends to be a better estimator of the population mean than the sample median is of the population median. • This means that confidence intervals for the mean tend to be narrower than for the median.

Mean or median?

Comparing length of Lake Taupo trout in 1995 and 1998 – can you make a call?

Spread of a distribution • What do we mean by “spread”? Score playing SKUNK with a strategy

Spread of a distribution • Spread describes how far the values in the group are from the centre, how variable they are. Score playing SKUNK with a strategy

Rangeis not a useful measure of spread because it is determined only by extreme values. • Students should not use range in any NCEA standard (except the numeracy unit standards).

IQR measures the spread of the whole distribution IQR is calculated using the width of the middle 50% but it is a measure of the variability of the whole group(just as SD measures the variability of the whole group). Score playing SKUNK at first and then with a strategy

Shift and overlap are comparisons of centre • Shift answers the question “Which is bigger?” • Overlap answers the question “How much bigger, relative to the spread?” Score playing SKUNK at first and then with a strategy

Describing a sample: focusing on the key concepts • Your observation of centre, • Sample statistics confirming what you observed. • shift and overlap. • Your observation of spread • Sample statistics confirming what you observed. • Shape, symmetry and unusual features.

Beyond centre and spread Statistical error is the difference between the sample statistic and the (unknown) population parameter.

What is sampling error?

What is sampling error? It depends where you ask. It is defined differently in different countries. In NZ (from Statistics NZ): • Sampling error arises due to the variability that occurs by chance because a random sample, rather than an entire population, is surveyed. • Non-sampling error is all error that is not sampling error.

Non-sampling error Non-sampling error is all error that is not sampling error. Non-sampling error includes bias due to: • A sampling frame which does not represent the population • Sampling method • The sampling process • and anything else except sampling variability and choice of sample size.

Sample size • There is no statistical basis for insisting on a sample size of 30. • A sample doesn’t have to be very big to give a rough estimate of the centre of the population. • A comment that a bigger sample size would give a better estimate of the population centre would have to be justified by explaining why it would be important to have a better estimate in that context.

Do sample sizes have to be equal? Why or why not?

For comparison, sample sizes do not have to be equal • The only reason it is useful for sample sizes to be similar is to minimise wasted effort. • As in measurement, if two measurements with very different precision are in the same calculation, the extra precision of one measurement is lost in rounding. • The extra effort of making one measurement more precise would be better spent on the precision of the other.

Sample size • The extent of sampling variability for proportions appears more than the mean or median sampling variability for the same sample size. • Sample size needs to be fairly large (over about 200) to get a reasonable estimate of population proportions or the shape of the distribution.

Population size is not relevant to sample size for large populations • A sample size of 1000 can give an estimate of proportions for a population of 1million or 200 million. • There is no requirement that a sample be a certain percentage of the population size.

Two-way Venn diagrams are not good problem solving tools. Students who use two-way tables are much more successful at solving probability problems than students who use Venn diagrams.

the end

Statistical concepts