730 likes | 3.18k Views
Fundamentals of Sampling Method. Week 4 Research Methods & Data Analysis. Tutorials. Thursday 30 th October 9-11 AG GL 20 (M. Mazzocchi) Tuesday 4 th November 11-1pm (H.Neeliah) You may attend: One (the most convenient for you) Both (it may be very useful) None (not really advised…).
E N D
Fundamentals of Sampling Method Week 4 Research Methods & Data Analysis Research Methods & Data Analysis
Tutorials • Thursday 30th October9-11 AG GL 20 (M. Mazzocchi) • Tuesday 4th November11-1pm (H.Neeliah) • You may attend: • One (the most convenient for you) • Both (it may be very useful) • None (not really advised…) Research Methods & Data Analysis
Lecture outline • Key notions of statistics • Simple random sampling • Sampling error • Sampling size • Other sampling methods Research Methods & Data Analysis
Distributions • A set of values of a set of data together with their • Absolute frequencies • Relative frequencies (probabilities) Research Methods & Data Analysis
Relative and cumulate frequencies fi=ni/N Research Methods & Data Analysis
Distributions of random variables • The distribution of possible values together with their probabilities (probability density function, p.d.f.) Research Methods & Data Analysis
The normal (Gaussian) distribution • …is the distribution representing perfect randomness around a mean value • In statistics, the normal distribution play a key role in the theory of errors • The central limit theorem implies that “averaging” almost always give origin to a normal distribution (error on the average is random), provided that the number of observation is large (>40) Research Methods & Data Analysis
The normal distribution p 95% of values 0,025 0,025 m-1.96s m m+1.96s Research Methods & Data Analysis
The student-t distribution • When the parameter in the population has a normal distribution (with unknown variance), within the sample the parameter assumes a t distribution • The t-distribution is similar to the normal distribution, apart from having higher tail-probabilities • The bigger is the sample, the more similar the t-distribution is to the normal distribution • For samples with more than 30-40 units, the difference between the two distributions is negligible Research Methods & Data Analysis
The t-distribution x-ta/2sx x x+ta/2sx Research Methods & Data Analysis
ta/2 and za/2 – tabled values Research Methods & Data Analysis
Population parameters(in a population of N elements) • Mean • Variance • Standard deviation Research Methods & Data Analysis
Sampling • A sample is a subgroup of the population selected for the study • Sample statistics allow to make inference about the population parameters, through estimation and hypothesis testing • The sample space is a complete set of all possible results of the sampling procedure Research Methods & Data Analysis
Simple random sampling • Each element of the population has a known and equal probability of selection • Every element is selected independently from other elements • The probability of selecting a given sample of n elements is computable (known) • The Central Limit Theorem guarantees that for simple random samples with sample size (n) sufficiently large (>40), the sample mean in a S.R.S. follows the normal distribution Research Methods & Data Analysis
Sample statistics • Sample mean • Sample variance • Sample standard deviation unbiasedness Research Methods & Data Analysis
Standard deviation and standard error • The standard deviation measures the variability of a given variable (e.g. X) within the population or sample • The standard error refers to the accuracy (variability) of the sample statistics (e.g. mean), i.e. the error due to the fact that the statistic is computed on a sample rather than on the population (sampling error) Research Methods & Data Analysis
Basic SRS sample statistics (unknown pop. variance) Mean case Proportion case (p) Sample standard deviation of X Standard error of the mean/proportion ACCURACY of sample estimates Research Methods & Data Analysis
Finite population correction factor • For finite population (…i.e. all in social research), large samples (more than 10% of N) tend to overestimate the standard error of the sample mean (proportion) • In order to account for that, the following correction is necessary Research Methods & Data Analysis
Level of confidence aand z parameter The level of confidence a refers to the probability that the true population mean falls in the identified confidence interval For the normal distribution, given a value of a, the corresponding za/2values is tabulated a=0.05 za/2 =1.96 a/2 a/2 x Confidence interval for x at a level of confidence a Research Methods & Data Analysis
The t-distribution x-ta/2sx x x+ta/2sx Research Methods & Data Analysis
Confidence intervals • Calculate the sample mean • Decide a level of confidence (usually 95% or 99%) • Choose whether using the Student-t distribution or the Normal distribution • Compute the sample standard error • Define the lower and upper bound of the confidence interval Research Methods & Data Analysis
Exercise • Suppose that you have interviewed 20 students out of 200 in the agricultural building, asking them how much they paid for lunch yesterday • You get an average of £ 3.67 • The standard deviation is 1.25 • Compute the 95% confidence interval • Compute the 99% confidence interval Research Methods & Data Analysis
Determining sample size Factors influencing sample size (n): • Size of the population (N) • Variability of the population (s) • Desired level of accuracy (q) • Level of confidence (a) • Budget constraint Research Methods & Data Analysis
Simple random sampling: determining sample size • Relative sampling error (r.s.e) • Determining sampling size for a given r.s.e. (approximate formula) Research Methods & Data Analysis
The sampling design process • Define the target population, its elements and the sampling units • Determine the sampling frame (list) • Select a sampling technique • Sampling with/without replacement • Probability/Nonprobability sampling • Determine the sample size • Precision versus costs • The marginal value in terms of precision of additional sampling units is decreasing • Execute the sampling process Research Methods & Data Analysis
The sampling techniques • Probabilistic samples • Simple random sampling • Systematic sampling • Stratified sampling • Cluster sampling • Other sampling techniques • Nonprobabilistic samples • Convenience sampling • Judgmental sampling • Quota sampling • Snowball sampling Research Methods & Data Analysis
Representativeness • A sample can be considered as “representative” when it is expected to exhibit the average properties of the population Research Methods & Data Analysis
Selection bias • Improper selection of sample units (ignoring a relevant “control variable” that generate bias), so that the values observed in the sample are biased and the sample is not representative. Example: A survey is conducted for measuring goat milk consumption, but the interviewers just select people in urban areas, that on average drink less goat milk. Research Methods & Data Analysis
Simple random sampling • Each element of the population has a known and equal probability of selection • Every element is selected independently from other elements • The probability of selecting a given sample of n elements is computable (known) • Statistical inference is possible • It is easily understood • Representative samples are large and expensive • Standard errors are larger than in other probabilistic sampling techniques • Sometimes it is difficult to execute a really random sampling Research Methods & Data Analysis
Systematic sampling • A list of N elements in the population is compiled, ordered according to a specified variable • Unrelated to the target variable (similar to SRS) • Related to the target variable (increased representativeness) • A sampling size n is chosen • A systematic step of k=N/n is set • A random number s between 1 and N is extracted and represents the first element to be included • Then the other elements selected are s+k, s+2k, s+3k… • Cheaper and easier than SRS • More representative if order is related to the interest variable (monotone) • Sampling frame not always necessary • Less representative (biased) if the order is cyclical Research Methods & Data Analysis
Stratified sampling • Population is partitioned in strata through control variables (stratification variables), closely related with the target variable, so that there is homogeneity within each stratum and heterogeneity between strata • A simple random sampling frame is applied in each strata of the population • Proportionate sampling: size of the sample from each stratum is proportional to the relative size of the stratum in the total population • Disproportionate sampling: size is also proportional to the standard deviation of the target variable in each stratum • Gains in precision • Include all relevant subpopolation even if small • Stratification variables may not be easily identifiable • Stratification can be expensive Research Methods & Data Analysis
Cluster sampling • The population is partitioned into clusters • Elements within the cluster should be as heterogeneous as possible with respect to the variable of interests (e.g. area sampling) • A random sample of clusters is extracted through SRS (with probability proportional to the cluster size) • 2a. All the elements of the cluster are selected (one-stage) • 2b. A probabilistic sample is extracted from the cluster (two-stage cluster sampling) • Reduced costs • Higher feasibility • Less precision • Inference can be difficult Research Methods & Data Analysis
Non probabilistic samples Research Methods & Data Analysis
Convenience sampling • Only “convenient” elements enter the sample • Cheapest method • Quickest method • Selection bias • Non representativeness • Inference is not possible Research Methods & Data Analysis
Judgmental sampling • Selection based on the judgment of the researcher • Low cost • Quick • Non representativeness • Inference is not possible • Subjective Research Methods & Data Analysis
Quota sampling • Define control categories (quotas) for the population elements, such as sex, age… • Apply a “restricted judgmental sampling”, so that quotas in the sample are the same of those in the population • Cheapest method • Quickest method • There is no guarantee that the sample is representative (relevance of control characteristic chosen) • Many sources of selection bias • No assessment of sampling error Research Methods & Data Analysis
Snowball sampling • A first small sample is selected randomly • Respondents are asked to identify others who belong to the population of interests • The referrals will have demographic and psychographic characteristics similar to the referrers • Lower costs • Low variability • Useful for “rare” populations • Inference is not possible Research Methods & Data Analysis