Efficient Sampling for Distance Analysis of Student Commutes

L2 Sampling Exercise A possible solution

Selecting a sample • I will select my sample using a systematic sampling technique. • For this I will systematically select every 4th student (136/30) starting at student number 71 which I have randomly selected using the random number function on my calculator.

Sample Data

Justify choice of method • I chose a systematic method because it was quite quick and easy to use. • I know that it will give me a good spread of year groups because the data has been sorted into year groups.

Representative • Because I used a systematic method I have a close to, (see later) proportional representation of year levels. • I seem to have selected more females than males, 21 females and 9 males, so males are perhaps under represented.

Statistics • For my sample I got a mean of 1.7967 and a sample standard deviation of 1.168. • Using these values I would estimate a population mean of 1.8 km and a standard deviation for the population of 1.2 km.

Box Plot From my box plot I can see that the data is evenly spread. It is fairly symmetrical about the median and a bit spread out between the upper quartile and the largest value. The median distance is 1.65 km which is just lower than the mean indicating the data is skewed more toward the lower distances. 75% of students live within 2.5 km of school.

Evaluation • The sampling process is an appropriate one given the way the original data was presented. Because it was ordered according to year levels a systematic sampling method gave me the same sort of result that a stratified method would have given me, with a lot less work. • I noticed in my sample more females than males however this would not affect the estimate as a student who lives any given distance from school is just as likely to be male or female. • In dividing 136 by 30 the answer comes out as 4.53. I chose to sample every 4th person rather than every 5th. This should not have biased my result in any way as there is still a representative number chosen from each level.

Evaluation continued • The mean for the 9 yr12 students is 1.7, sd = 1.5 • The mean for the 8 yr 13 students is 1.925, sd = 1.35 • The mean for the 13 yr 11 students is 1.78, sd = 0.84 • From the statistics above it would appear that there is little difference between the distances traveled for the different year levels. • Due to the median being a bit lower than the mean it suggests that the mean has been affected by the unusually large values of 4.7 km and 3.9 km which are significantly larger than the rest.

Conclusion • For this reason I would suggest that a better estimate for the average distance is closer to 1.7 km.

Efficient Sampling for Distance Analysis of Student Commutes