Reasoning in Psychology Using Statistics

Reasoning in PsychologyUsing Statistics Psychology 138 2017

Inferential statistics used to generalize back Population • Hypothesis testing • Testing claims about populations (and the effect of variables) based on data collected from samples • Estimation • Using sample statistics to estimate the population parameters Sampling to make data collection manageable Sample Inferential statistics

Inferential statistics used to generalize back Population • Hypothesis testing • Testing claims about populations (and the effect of variables) based on data collected from samples • Estimation • Using sample statistics to estimate the population parameters Sampling to make data collection manageable Sample Inferential statistics Cummings (2012) website

μ μ μ μ = ? = ? = ? = ? Age hours of studying per week hours of sleep per night pizza consumption • Describe the typical college student • If we can’t measure the entire population of students, how do we get the population means for these variables?

μ = ? X • If we can’t measure the entire population of students, how do we get the population means for these variables? • Estimate it based on what we do know • On information from a sample • Two kinds of estimation • Point estimates • A single score • Interval estimates • A range of scores Estimation

Age hours of studying per week hours of sleep per night pizza consumption • Describe the typical college student • Point estimates “12 hrs” • Interval estimates “2 to 21 hrs” “19 yrs” “17 to 21 yrs” “8 hrs” “1 per wk” “4 to 10 hrs” “0 to 8 per wk”

Estimate the number of people attending lecture today How confident are you that your estimate is correct? “Not real confident, maybe 20%” “50 students” “Fairly confident, maybe 90%” “Somewhere between 20 and 80 students” Estimation

How confident are you that your estimate is correct? Disadvantage • Two kinds of estimation • Point estimates • A single score • Interval estimates • A range of scores Advantage Low “confidence” in the estimate A single score Feels precise Why? Because we recognize sampling error “50 students” High “confidence” in the estimate A range of scores Feels wishy-washy “Somewhere between 20 and 80 students” “The # is 50 ± 30” It is bound to be there somewhere Estimation

Do some adding/subtracting Multiply both sides by This is what we want to estimate • Both kinds of estimates use the same basic procedure • The formula is a variation of the test statistic formulas (we’ll start with the z-score) Estimation

Why the sample mean? • Both kinds of estimates use the same basic procedure • The formula is a variation of the test statistic formulas (we’ll start with the z-score) • It is often the only piece of evidence that we have, so it is our best guess. • Most sample means will be pretty close to the population mean, so we have a good chance that our sample mean is close. Estimation

Margin of error • A test statistic value (e.g., a z-score) • based on design (z or t) and level of confidence • The standard error (the difference that you’d expect by chance) • Based on sample size (n) and population standard deviation (σ) • Both kinds of estimates use the same basic procedure • The formula is a variation of the test statistic formulas (we’ll start with the z-score) Estimation

Both kinds of estimates use the same basic procedure • The formula is a variation of the test statistic formulas (we’ll start with the z-score) • The standard error (the difference that you’d expect by chance) • Based on sample size (n) and population standard deviation (σ) Note: How you compute your standard error will depend on your design Estimation

Both kinds of estimates use the same basic procedure • The formula is a variation of the test statistic formulas (we’ll start with the z-score) • A test statistic value (e.g., a z-score) • based on design (z or t) and level of confidence Distribution of the test statistic Confidence interval uses the zcrit values that identify the top and bottom tails The upper and lower 2.5% A 95% CI is like using a “two-tailed” z-test with with α = 0.05 2.5% 2.5% Estimation 95% of the sample means

μ = ? z (or t) = 0, right in the middle transform Raw scores z scores Z = 0 • Finding the right test statistic for your estimate • You begin by making a reasonable estimation of what the z (or t) value should be for your estimate. • For a point estimation, you want what? Estimation

Actual population mean μ • Finding the right test statistic (z or t) • You begin by making a reasonable estimation of what the z (or t) value should be for your estimate. • For a point estimation, you want what? z (or t) = 0, right in the middle • For an interval, your values will depend on how confident you want to be in your estimate • What do I mean by “confident”? • 90% confidence means that 90% of confidence interval estimates of this sample size will include the actual population mean 9 out of 10 intervals contain μ Estimation

Finding the right test statistic (z or t) • You begin by making a reasonable estimation of what the z (or t) value should be for your estimate. • For a point estimation, you want what? z (or t) = 0, right in the middle • For an interval, your values will depend on how confident you want to be in your estimate • Computing the point estimate orthe confidence interval: • Step 1: Take your “reasonable” estimate for your test statistic • Step 2: Put it into the formula • Step 3: Solve for the unknown population parameter Estimation

Make a point estimate of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. sample mean serves as the center z (or t) = 0, right in the middle So the point estimate is the sample mean Estimates with z-scores

Make an interval estimate with 95% confidence of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. 95% What two z-scores do 95% of the data lie between? Estimates with z-scores

Make an interval estimate with 95% confidence of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. From the table: • z(1.96) =.0250 2.5% 2.5% 95% What two z-scores do 95% of the data lie between? So the 95% confidence interval is: 83.04 to 86.96 or 85 ± 1.96 Estimates with z-scores

Make an interval estimate with 90% confidence of the population mean given a sample with a X = 85, n = 25, and a population σ = 5. 5% 5% 90% What two z-scores do 90% of the data lie between? • From the table: • z(1.65) =.0500 So the 90% confidence interval is: 83.35 to 86.65 or 85 ± 1.65 Estimates with z-scores

Make an interval estimate with 90% confidence of the population mean given a sample with a X = 85, n = 4, and a population σ = 5. 5% 5% 90% What two z-scores do 90% of the data lie between? • From the table: • z(1.65) =.0500 So the confidence interval is: 80.88 to 89.13 or 85 ± 4.13 Estimates with z-scores

Sample size • As n increases, the margin of error gets narrower (changes the standard error) • Level of confidence • As confidence increases (e.g., 90%-> 95%), the margin of error gets wider (changes the critical test statistic values) • The size of the margin of error related to: Estimation

Center/point estimate? How do we find this? How do we find this? • Two kinds of estimates that use the same basic procedure • The formula is a variation of the test statistic formulas Different Designs: Estimating the mean of the population from one or two samples, but we don’t know the σ Depends on the design (what is being estimated) Use the t-table & your confidence level Depends on the design Estimation in other designs

so two tails with 2.5% in each 2.5%+2.5% = 5% or α = 0.05, so look here 95% in middle 2.5% 2.5% 95% Confidence intervals always involve + a margin of error This is similar to a two-tailed test, so in the t-table, always use the “proportion in two tails” heading, and select the α-level corresponding to (1 - Confidence level) What is the tcrit needed for a 95% confidence interval? Estimates with t-scores

Confidence interval Diff. Expected by chance Estimating the difference between the population mean and the sample mean based when the population standard deviation is not known Estimation in other designs

Make an interval estimate with 95% confidence of the population mean given a sample with a X = 85, n = 25, and a sample s = 5. 2.5% 2.5% 95% What two critical t-scores do 95% of the data lie between? So the confidence interval is: 82.94 to 87.06 • From the table: • tcrit =+2.064 or 85 ± 2.064 Estimation in one sample t-design

Estimating the difference between two population means based on two related samples Confidence interval Diff. Expected by chance Estimation in related samples design

Dr. S. Beach reported on the effectiveness of cognitive-behavioral therapy as a treatment for anorexia. He examined 12 patients, weighing each of them before and after the treatment. Estimate the average population weight gain for those undergoing the treatment with 90% confidence. Differences (post treatment - pre treatment weights): 10, 6, 3, 23, 18, 17, 0, 4, 21, 10, -2, 10 Related samples estimation Confidence level 90% CI(90%)= 5.72 to 14.28 Estimation in related samples design

Estimating the difference between two population means based on two independent samples Confidence interval Diff. Expected by chance Estimation in independent samples design

Dr. Mnemonic develops a new treatment for patients with a memory disorder. He randomly assigns 8 patients to one of two samples. He then gives one sample (A) the new treatment but not the other (B) and then tests both groups with a memory test. Estimate the population difference between the two groups with 95% confidence. Independent samples t-test situation Confidence level 95% CI(95%)= -8.73to 19.73 Estimation in independent samples design

Notice that this interval includes zero -8.73 19.73 0 • If we had instead done a hypothesis test with an α = 0.05, what would you expect our conclusion to be? H0: “there is no difference between the groups” • - Fail to reject the H0 CI(95%)= -8.73to 19.73 Relating estimates to hypothesis tests

Design Estimation (Estimated) Standard error One sample, σ known One sample, σ unknown Two related samples, σ unknown Things to note: • The design drives the formula used • The standard error differs depending on the design (kind of comparison) • Sample size plays a role in SE formula and in df’s of the critical value • Level of confidence comes in with the critical value used Two independent samples, σ unknown Estimation Summary

Two types typically • Standard Error (SE) • diff by chance • Confidence Intervals (CI) • A range of plausible estimates of the population mean CI: μ = (X) ± (tcrit) (diff by chance) Note: Make sure that you label your graphs, let the reader know what your error bars are Error bars

Important point! • In text (APA style) example M = 30.5 cm, 95% CI [18.0, 42.0] • In graphs as error bars • In tables (see more examples in APA manual) Error Bars: Reporting CIs

Error bars can be informative about group differences, but you have to know what to look for Rule of thumb for 95% CIs*: • If the overlap is about half of one one-sided error bar, the difference is significant at ~ p < .05 • If the error bars just abut, the difference is significant at ~ p< .01 *works if n > 10 and error bars don’t differ by more than a factor of 2 Cumming & Finch, 2005 Group Differences: 95%CI Rule of Thumb Error Bars: Reporting CIs

-1.4 5.9 MD = 2.23, 95% CI [-1.4, 5.9] • If we had instead done a hypothesis test on 2 independent samples with an α = 0.05, what would you expect our conclusion to be? • H0: “there is no difference between the groups” MD = 2.23, t(34) = 1.25, p = 0.22 • - Fail to reject the H0 0 Hypothesis testing with CIs

-1.4 5.9 0.6 6.6 MD = 2.23, 95% CI [-1.4, 5.9] • If we had instead done a hypothesis test on 2 independent samples with an α = 0.05, what would you expect our conclusion to be? • H0: “there is no difference between the groups” MD = 2.23, t(34) = 1.25, p = 0.22 • - Fail to reject the H0 0 MD = 3.61, 95% CI [0.6, 6.6] • - reject the H0 MD = 3.61, t(42) = 2.43, p = 0.02 Hypothesis testing with CIs

Because CIs are more informative than p-values Hypothesis testing & p-values • Dichotomous thinking • Yes/No reject H0 (remember H0 is “no effect”) Neyman-Pearson approach • Strength of evidence Fisher approach Confidence Intervals • Gives plausible estimates of the pop parameter (values outside are implausible) • Provide information about both level and variability • Wide intervals can indicate low power • Good for emphasizing comparisons across studies (e.g., meta-analytic thinking) • Can also be used for Yes/No reject H0 Estimation: Why?

Practice computing and interpreting confidence intervals • Understanding CI: https://www.youtube.com/watch?v=tFWsuO9f74o • Calculating CI: https://www.youtube.com/watch?v=s4SRdaTycaw • Kahn Academy: • CI and sample size: https://www.youtube.com/watch?v=K4KDLWENXm0 • CI and t-test: https://www.youtube.com/watch?v=hV4pdjHCKuA • CI for Ind Samp: https://www.youtube.com/watch?v=hxZ6uooEJOk (pt 2) • CI and margin of error: https://www.youtube.com/watch?v=UogOJHgJDqs • HT and CI: https://www.youtube.com/watch?v=k1at8VukIbw • HT vs. CI rap: https://www.youtube.com/watch?v=C88fUKAHPn0 • CIs by Geoff Cumming: • Introduction to: https://www.youtube.com/watch?v=OK6DXfXv8BM • Workshop (6 part series) In labs

Reasoning in Psychology Using Statistics