Design and Data Analysis in Psychology I English group (A)

Design and Data Analysis in Psychology IEnglish group (A) School of PsychologyDpt. Experimental Psychology Salvador Chacón Moscoso Susana Sanduvete Chaves Milagrosa Sánchez Martín

Sampling and sampling distribution Lesson 5

1. Introduction • The statistical inference presents two categories: • Estimation theory (lesson 6): • Given an index in the sample, the aim is to infer the value of the index in the population. • Two kinds of estimation: • Punctual estimation: it provides a single value. • Estimation by intervals: it facilitates a range of values. • Decision theory (lesson 8): • Procedure to make decisions in the field of statistical inference.

1. Introduction ESTIMATION THEORY PARAMETERS STATISTICS

2. Phases of the inferential process • Obtain a sample randomly. • Calculate the statistics (indexes in the sample): • Construct a sampling distribution (means or proportions; the possible results that can be found taking different samples). • Choose a probability model (e.g., if we throw a dice, there are six possible results, and they are equiprobable). The most used in psychology is the normal law. • Calculate the corresponding parameters (indexes in the population) based on the statistics.

3. Sampling error • The value of the statistic will be closer to the value of the parameter depending on the degree of representativeness of the sample studied. For example, it depends on: • The sample size. • The similarity-difference between participants. • The sampling procedure. • Nevertheless, there will be always some discrepancy between statistic and parameter. This is the sampling error. • Solution: • The precise value of the sampling error is unknown. • Using the inference, we will know with a certain confidence that this error does not exceed a limit.

3. Sampling error. Calculation: SamplePopulation Statistics Parameters (Latin letters) (Greek letters) μ π σ p S

3. Sampling error. Calculation: The sampling error is the difference between a statistic and its corresponding parameter. e

3. Sampling error • There are two main concepts related to the sampling error: • Accuracy: the precision with which a statistic represents the parameter. • Reliability: the measure of the constancy of a statistic when you calculate it for several samples of the same type and size.

3. Sampling error • Accuracy: example. • What estimator is more accurate?  = 50

3. Sampling error is more accurate.

3. Sampling error • Reliability: example. • What group of means is more reliable?

3. Sampling error • Reliability: example. • The first group of means is more reliable because variation between them is lower.

3. Sampling error • The lower the sampling error is, the more probable is that the estimator in a sample presents the same value as the parameter.

4. Sampling distribution • Definition: it is a distribution of theoretical probability that establishes a functional relation between the possible values of a statistic, based on a sample of size n and the probability associated with each one of these values, for all the possible samples of size n, extracted from a particular population. • The construction of a sampling distribution presents three phases:

4. Sampling distribution PHASE 1. Collect all the samples of the same size n, extracted randomly from the population under study. S1 Population S2 S3 Sk

4. Sampling distribution PHASE 2. Calculate the same estimator in each sample. • S1 • S2 • S3 • Sn We will find different values of the estimator (e.g., the mean) in the different samples.

4. Sampling distribution PHASE 3. Group these measures in a new distribution. Mean of means

4. Sampling distribution • In general, the sampling distribution will differ from the distribution of the population. • The variance of the statistic provides a measure of dispersion of the particular sampling values with respect to the expected value of the statistic, considering all the possible samples of size n. • The standard deviation of the sampling distribution is called standard error of the estimator. • We are only going to study the sampling distribution of two statistics: • 4.1. The mean. • 4.2. The proportion.

4.1. Sampling distribution of the mean Mean or expected value Standard error

4.1. Sampling distribution of the mean Distribution of the population Sampling distribution

4.1. Sampling distribution of the mean. Characteristics • The statistics obtained in the samples are grouped around the parameter of the population. • The bigger n is, the closer to the parameter the statistics are. • In large samples, the graphic representation presents the following characteristics:

4.1. Sampling distribution of the mean. Characteristics a) It is symmetric. The central vertical axis is the parameter . b) The bigger n is, the narrower the Bell-shaped curve is. c) It takes the form of the normal curve.

4.1. Sampling distribution of the mean. Characteristics • Its mean matches with the real mean in the population. • It is more or less variable. If its change is small (i.e., has a small sigma), means differ little from each other, and it is very reliable.

4.1. Sampling distribution of the mean. Standardization Sample Population Sampling distribution

4.1. Sampling distribution of the mean. Standardization • Standardization allows to calculate probabilities (if you know the probability model that has the distribution). We can consider normal distribution when n≥30.

4.1. Sampling distribution of the mean correction

We applied a test to a population and we obtained a mean (μ) of 18 points and a standard deviation (σ) of 3 points. Assuming that the variable is normally distributed in the population: a) Which raw scores do delimit the central 95% of the participants of that population?b) Which raw scores do delimit the central 99% of the average scores in samples of 225 participants, obtained randomly? 4.1. Sampling distribution of the mean. Example 1

4.1. Sampling distribution of the mean. Example 1 a) Which raw scores do delimit the central 95% of the participants of that population? 0.475 0.475 Z1=-1.96 Z2=1.96

4.1. Sampling distribution of the mean. Example 1

4.1. Sampling distribution of the mean. Example 1 X1=12.12 X2=23.88 The raw scores that delimit the central 95% of the participants are 12.12 and 23.88.

4.1. Sampling distribution of the mean. Example 1 b) Which raw scores do delimit the central 99% of the average scores in samples of 225 participants, obtained randomly? 99% 0.495 0.495 -2.58 2.58

4.1. Sampling distribution of the mean. Example 1 99% 17.484 and 18.516 delimit the central 99% of the average scores in samples of 225 participants.

Calculate the probability of extracting a sample of 81 participants with mean equal or lower than 42, from a population whose mean () is 40 and standard deviation () is 9. 4.1. Sampling distribution of the mean. Example 2

4.1. Sampling distribution of the mean. Example 2 0.5 ?

In a sampling distribution of means with samples of 49 participants, the means of the central 90% of the samples are between 47 and 53 points. Calculate:a) The raw scores that delimit the central 95% of the means.b) The standard deviation of the population (σ).c) The raw scores that delimit the central 95% of the means, when the sample size is 81. 4.1. Sampling distribution of the mean. Example 3

4.1. Sampling distribution of the mean. Example 3 • The raw scores that delimit the central 95% of the means. 90% 0.45 0.45

4.1. Sampling distribution of the mean. Example 3 95% 0.475 0.475 Z1=-1.96 Z2=1.96

b) The standard deviation of the population (σ). 4.1. Sampling distribution of the mean. Example 3

4.1. Sampling distribution of the mean. Example 3 c) The raw scores that delimit the central 95% of the means, when the sample size is 81. 95% -1.96 1.96

4.2. Sampling distribution of proportions • p = x/n, being x the number of participants that presented a characteristic and n, the sample size. • We can consider normal distribution when Πn ≥5 and (1- Π)n ≥5

4.2. Sampling distribution of proportions Mean or expected value Standard error

4.2. Sampling distribution of proportions. Standardization

4.2. Sampling distribution of proportions correction

4.2. Sampling distribution of proportions. Example 1 In a population, the proportion of smokers was 0.60. If we chose from this population a sample of n=200, which is the probability of finding 130 or fewer smokers in that sample?

4.2. Sampling distribution of proportions. Example 1 Can we consider these data from a normal distribution?

Design and Data Analysis in Psychology I English group (A)