Sampling

Sampling Chapter 7 of the textbook Pages 215-250

Introduction • Descriptive statistics allow us to describe and summarize our data • Inferential statistics allow us to infer unknown values and/or the probability of events occurring using the data we have (this let’s us test hypotheses) • Probability is at the heart of inferential statistics because we base our results on samples

Sampling Error • Sampling error occurs when sample characteristics deviate from population characteristics (i.e., an unrepresentative sample) • Based on random chance • How do we know what the population characteristics are? • How can sampling error be decreased? • “sampling error is not a “mistake”…. All samples deviate from the population in some way; thus, sampling error is always present”

Good Points From Textbook • Uncertainty associated with sampling error is the price one pays for using a sample. • “The appeal of statistics is not that it removes uncertainty but rather that it permits inference in the presence of uncertainty.”

Sampling Bias • When the selection of the sample favors inclusion of members of a population with certain characteristics • Based on sampling procedures (a.k.a. sampling design or sampling scheme) • How can sampling bias be minimized?

Why Sample? • Samples take less time and money to collect – this is particularly true when very detailed measurements are being taken (e.g., in-depth interviews with follow-ups) • A census has error too (contains non-sampling errors) • The population may be infinite • The act of sampling may be destructive • The population may be hypothetical • Populations may change rapidly (i.e., require repeated measurement)

Steps in the Sampling Process • Critical pre-sampling step • Make sure there is a need to sample (i.e., see if available data may suit your study) • This can be challenging and can require a lot of effort, but it is usually MUCH easier than collecting a new sample • Rushing out to sample will often come back to haunt you (think before you collect!)

Steps in the Sampling Process • Step 1 – Define your population • Who or what is are you interested in (or not) • This is directly connected with the research question(s) you are asking • Example: Geography Students • Does this mean majors or any students in a geography class? • Does this mean graduate and undergraduate students? • Does this include alumni or only current students?

Steps in the Sampling Process • Step 2 – Construct a sampling frame • A sampling frame is an exhaustive list of all individuals in a population (i.e., who/what can be sampled) • A sample is only relevant for the sample frame, not appropriate for making inferences beyond the defined sample frame • Differentiate target population and sample populations • Target populations are all the individuals relevant to a study (i.e., who/what we want to include) • The sample population is who/what we actually sample • Goal is for these to be equal • The target and sample populations can differ when some of the target population can’t be sampled • Example: A list of all geography majors

Steps in the Sampling Process • Step 3 – Sampling Design • The procedures we use to select members from the sample frames for the sample • Many ways to do this, we’ll discuss several in the coming slides • Example: how we will go about picking a sample of 20 geography majors

Steps in the Sampling Process • Step 4 – Specify the information to be collected • Based on pre-testing (a.k.a. pilot testing) the instrument, tools, staff abilities, logistical constraints, etc. • The data collected relate directly to the research questions • This step assesses feasibility and corrects any problems before sampling begins • Example: What questions we will ask the students and about how long it will take to conduct an interview

Steps in the Sampling Process • Step 5 – Data Collection • The actual collection of the sample • Data accuracy and quality are determined during this step (i.e., non-sampling error happens here) • Careful measurement and careful recording of the measurements are key

Types of Samples • Non-Probability Sample – the likelihood of an individual being sampled is unknown • The sample may be representative or not, but the quality of the sample cannot be determined • Probability Sample – the likelihood of an individual being sampled is known • Since the probabilities are known, probability theory can be applied to make inferences

Non-Probability Samples • Judgmental – personal judgment is used to determine which individuals should be included in a sample • Convenience – sample in which only the convenient or accessible individuals are selected • Quota – data obtained from specific subgroups to avoid over or under representation • Volunteer Sample – individuals self select to take part in a study

Probability Samples • Random • Systematic • Stratified Random • Clustered

Random Samples • For finite populations – each possible sample of size n has an equal probability of being selected • For infinite populations – all observations chosen are statistically independent • Sampling with & without replacement • What would a random sample look like for spatial data when mapped?

Random Number Generators • This is one mechanism for actually getting a random sample • Key components • Uniform probability of a number being selected (i.e., 1/10 chance for each digit 0 to 9) • Independence (i.e., first number has no effect on second number)

Systematic Samples • Selecting every kth element of a sampling frame (e.g., taking every 10th element) • This is effectively random if • A) the starting point is random • B) there is no natural periodicity (i.e. pattern) to the data • What would a systematic sample look like for spatial data when mapped?

Stratified Random Samples • The sample frame is first split into fairly homogeneous classes, from which random samples are taken • Why would we do this? • What are the drawbacks of this approach? • What might a stratified random sample look like for spatial data? • Are there other options?

Cluster Samples • Data are grouped into heterogeneous clusters and a census is taken of randomly chosen clusters • Why might we use this approach, particularly for spatial data? • Why can this approach be problematic? • What would cluster samples look like on a map?

How do we decide which design to use? • Above all the sample should be as representative as possible • Choices made to increase efficiency, decrease cost or time, etc. should be made carefully • Many sampling designs are hybrids

Sampling Distributions • Recall from chapter 2 that there are descriptive statistics (e.g., the mean and the standard deviation) for both samples and populations • Recall from chapter 6 that a random variable (X) can be any value (x1 … xn) from a population, each with an associated probability • Therefore, random variables can be defined as functions (f) or probability distributions (e.g., a histogram or a curve) based on the values they can take on

Sampling Distributions • Now extend this concept so that our sample is a set of random variables from a population • What is the mean of the sample? • Just like you’d expect, the mean is the sum of the random variables divided by n • BUT, since each random variable is random, the mean itself is also a random variable • Therefore the sample mean can also be defined as a function or distribution

Sampling Distributions • Think of this as a new distribution (curve, function, etc.) where the graphed values relate to the mean values of the random variables (X) noted as • Conceptually this is the histogram that you would produce using the mean values from many independent samples from the same population • The sample statistic is the random variable (in this case the mean) based on a sample of random variables • Since the sample statistic is a random variable, it has a distribution, which is known as the sampling distribution

Example • Let the population = {10,12,13,16,19,20} • The population mean (μ) = 90/6 = 15 • The population standard deviation (σ) = 3.65 • Let the sample size (n) = 4 • All possible samples (15 total combinations):

Example Continued

Example Continued • In this case, because we have all possible samples (n=4) from our population the • Why are the standard deviation values different? • The standard deviation of a sample distribution is known as its standard error

Central Limits Theorem • For a large n (n > 30) • The distribution of will be approximately normal • The peak of the distribution is then the “mean of the means”, and since the distribution is normal we can estimate the actual population mean (μ) with some degree of confidence • The standard deviation of is: • As n increases the distribution of X becomes more peaked (i.e., the variability of decreases and more closely approximates μ)

Central Limits Theorem • How is this theorem useful to us? • “it provides a way of deducing the results of a sample based only on a knowledge of population mean and standard deviation”… and “determine the probability that a sample mean statistics is >, <, or within a given interval” • The key to making this possible is the approximate normal distribution, for which we can easily apply z-values

Example • Height of middle school kids • μ = 60 inches • σ = 10 inches • What is the probability of having a class (n = 30) with a mean height of 70 inches? • Remember that

Geographic Sampling • Why might we want to collect a geographic sample? • Many data are distributed spatially, but you could argue that space can be coded aspatially • However: • Recording space in addition to characteristics allows other (independent) variables to be derived as needed • Space can act as a place holder for things we don’t fully understand • What can we do with geographic samples? • Welcome to the field of spatial analysis….

Geographic Sampling • The sampling frame (i.e., all possible samples) is typically done using Cartesian coordinates (X and Y values) • Sample sites are then selected by choosing X,Y pairs using some sampling procedure (e.g., random) • Common Geographic Sampling Methods (i.e., the geographic unit/object being sampled) • Quadrats • Transects (traverses) • Point sampling

Quadrats • A square, areal sample (i.e., a polygon) • Quadrat size depends on feature of interest & research question – picking the “right” size can be challenging • Arrangement of quadrats on the landscape can be any of the types previously mentioned • Random • Systematic • Stratified Random • Clustered • Quadrat orientation can also vary

Transects • Transects sample a geographic area along lines • Placing the sample lines is how randomness etc. is included in the sample • Options discussed in textbook • Random • Systematic • Stratified random • Stratified systematic • The n value can be the total length (L) or the number of transects (typically of uniform length)

Point Samples • Sampling at point locations • Locating the points can be done similarly to locating quadrats • Options discussed in textbook • Random • Systematic • Stratified random • Clustered • Stratified, systematic, unaligned sample

Example • Carolina Vegetation Survey (CVS) • Paper is on blackboard (.pdf) • This is one example of a real sampling approach • In biogeography, ecology, etc. we often sample using nested quadrats

Summary • Think before you sample • Many sampling approaches (spatial or aspatial) exist, choosing the “right” one takes experience and some knowledge of the population • The Central Limits Theorem is related to the distribution of certain sample statistics (the mean in particular) and is important for inference

Sampling

Sampling

Presentation Transcript

Sampling

Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling Design Sampling Procedures

SAMPLING

Sampling

Sampling

Sampling...

Sampling

Sampling Designs Systematic Sampling Cluster Sampling Multistage Sampling

Sampling

Sampling

Sampling and Sampling Distributions

Sampling

Sampling

Sampling

Sampling dan Distribusi Sampling()

SAMPLING

Sampling

Sampling