530 likes | 602 Views
Everything You Ever wanted To Know About Statistics. Chapter 2. Aims and Objectives. Know what a statistical model is and why we use them. The Mean Know what the ‘fit’ of a model is and why it is important. The Standard Deviation Distinguish models for samples and populations.
E N D
Everything You Ever wanted To Know About Statistics Chapter 2
Aims and Objectives • Know what a statistical model is and why we use them. • The Mean • Know what the ‘fit’ of a model is and why it is important. • The Standard Deviation • Distinguish models for samples and populations
Populations and Samples • Population • The collection of units (e.g., people, plants, widgets, etc.) to which we want to generalize a set of findings or a statistical model. • Our theories generate predictions about ___? • Sample • A smaller (hopefully representative) collection of units from a population used to make inferenes about that population
A Simple Statistical Model • In Statistics we fit models to our data (i.e. we use a statistical model to represent what is happening in the real world). • The mean is a hypothetical value (i.e. is not necessarily in the data set). • As such, the mean is simple statistical model.
The Mean • The mean is the sum of all scores divided by the number of scores. • The mean is also the value from which the (squared) scores deviate least (it has the least error).
The Mean: Example • Collect some data (# pets): 1, 3, 4, 3, 2 • Add them up: • Divide by the number of scores, n:
The mean as a model If someone has one pet, we have error
Measuring the ‘Fit’ of the Model • The mean is a model of what happens in the real world: the typical score • It is not a perfect representation of the data • How can we assess how well the mean represents reality?
A Perfect Fit Rating (out of 5) Rater
Calculating Variability or ‘Error’ • -Extent to which observations are clustered or speed out about the mean. • A deviation is the difference between the mean and an actual data point. • Deviations can be calculated by taking each score and subtracting the mean from it:
Use the Total Error? • We could just take the error between the mean and the data and add them.
Sum of Squared Errors • We could add the deviations to find out the total error. • Deviations cancel out because some are positive and others negative. • Therefore, we square each deviation. • If we add these squared deviations we get the Sum of Squared Errors (SS).
Variance • The sum of squares is a good measure of overall variability, but is dependent on the number of scores. • We calculate the average variability by dividing by the number of scores (n). • This value is called the variance(s2). • Ian will now show you a user friendly calculation • Population vs. Sample
11 8 7 12 9 8 15 ? Degrees of Freedom Sample Population
Standard Deviation • The variance has one problem: it is measured in units squared. • This isn’t a very meaningful metric so we take the square root value. • This is the Standard Deviation(s).
Important Things to Remember • The Sum of Squares, Variance, and Standard Deviation represent the same thing: • The ‘Fit’ of the mean to the data • The variability in the data • How well the mean represents the observed data • Error
Going beyond the data: Z-scores • Z-scores • Standardising a score with respect to the other scores in the group. • Expresses a score in terms of how many standard deviations it is away from the mean. • The distribution of z-scores has a mean of 0 and SD= 1.
Properties of z-scores 1.96 cuts off the top 2.5% of the distribution. −1.96 cuts off the bottom 2.5% of the distribution. As such, 95% of z-scores lie between −1.96 and 1.96. 99% of z-scores lie between −2.58 and 2.58, 99.9% of them lie between −3.29 and 3.29.
Lets ease our way intoHYPOTHESIS TESTING • If we know everything about a population, and there is no measurement error, we do not really need to test a hypothesis • we just have to look to see if it is true. • But populations are often large, and we do not know everything there is to know. • So, we collect samples from a population to make inferences about the populations and test hypotheses.
Samples Vs. Populations • Sample • Mean and SD describe only the sample from which they were calculated. • Population • Mean and SD are intended to describe the entire population (very rare in Psychology). • Sample to Population: • Mean and SD are obtained from a sample, but are used to estimate the mean and SD of the population (very common in psychology).
Sampling Variation Is this odd? What is the probability of this?
Central Limit Theorem • For any population with a mean (mu) and a SD (sigma), the distribution of sample means for a sample size n will have a mean mu and a standard deviation (aka standard error) of sigma/sqrt n… • …and will approach a normal distribution as n approaches infinity. • Why is this important?
Population = 10 M = 10 M = 9 M = 11 M = 10 M = 9 M = 8 M = 12 M = 11 M = 10 Standard Error = The SD for the sampling distribution of sample MEANS Fix! N taken into account, large n, small n? same mean
Hypothesis testing • The experiment and question • Does Ritalin increase performance on a “SPAN” measure among children diagnosed with ADHD? • Pretend we know • the population mean (4.2) and SD (5.1) • We randomly select 30 ADHD children • Eventually, we give them Ritalin for a week, then measure their X-SPAN. They have a mean of 7.6.
Types of Hypotheses • Null hypothesis, H0 • There is no effect. • The alternative hypothesis, H1 • AKA the experimental hypothesis
Types of Predictions • When we predict the sample mean will fall in a particular direction from the population mean (e.g., that it will be higher than the mean), we are conducting a directional test. • We also call this a one-tailed test, as I will discuss in a minute.
Types of Predictions • When we predict the sample mean will be different than the population mean in some direction, either above or below, we are conducting a non-directional test. • We also call this a two-tailed test, as I will discuss in a minute.
Set a criterion level for our Decision: • How do we decide whether our sample mean is “different enough” from our original population mean? • How far away does the mean have to be for us to reasonably doubt that this sample came from the same population? • When are we going to say this sample is the same as the population (just sampling error) or when we are going to say this sample is different from the population.
α • We must establish a decision criteria, which we call ALPHA or α • Alpha is also known as the significance level… • Significance level – Predetermined probability that represents a sample result that is so rare or unusual that is cast doubt on the accuracy of Ho: alpha • The probability with which we are willing to reject Ho when it is true.
α • Rejection (or critical) region: the set of outcomes from an experiment that will lead us to reject Ho (conclude it is false). • Typically, Choose : • α =.05 (or 5%) OR • α =.01 (or 1%)
Critical Value • Critical Value – Value of a test statistic that is the boundary separating the critical region from the rest of the distribution. • We are going to need to convert our sample mean into a z-score. • We want to know if our Z-score for the sample mean is above or below our critical value. • What is our critical value? “Z critical” is the z-score with .05 in the tail proportion. What is it? • Zcritical = ??? • What would it be if we conducted a 2-tailed test (i.e., a non-directional test)? • Zcritica = ???
What gets rejected? • Depends on whether we have a one-tailed or two-tailed hypothesis • One-tailed (directional test) – Test that rejects extreme outcomes in only one specified tail of the distribution; .05 in one end • Two-tailed – Test that rejects extreme outcomes in either tail of the distribution; .025 at each end.
Writing 1- and 2-tailed Hypotheses • Two tail (non-directional): • Ho: there is no difference, µtreatment = µpop • Ha: there is a difference, µtreatment≠ µpop • One tail Directional: • Ho: µtreatment <= µpop OR µtreatment >= µpop • Ha: µtreatment > µpop OR µtreatment < µpop • (respectively)
Collect our data • Random sample of MSU students (remember, we are pretending you were randomly selected to take this course. • Randomly assign to condition…this class was assigned to the treatment although other classes could have been. • µADHD = 4.2 • σ = 5.1; n = 30 • µtreatment = 7.6 • We need to convert the treatment mean to a Z • Z = (7.6 – 4.2) / (5.1/srt30) = 3.66
Evaluate the null hypothesis • Two decisions possible: • Reject the null hypothesis (Class G.P.A. is higher than MSU G.P.A.) • When mean falls into rejection range • Fail to reject the null hypothesis (Class G.P.A. is equal to or less than the MSU GPA) • Did the Z for the sample mean fall above or below our critical value?
Parameter Estimation • Point estimation • Interval estimation • Confidence/Precision tradeoff • Confidence Intervals • Interval with a given probability of containing the true (hypothetical) population mean. • Equation…whiteboard time
Confidence Intervals • Domjan et al. (1998) • ‘Conditioned’ sperm release in Japanese Quail. • True Mean • 15 Million sperm • Sample Mean • 17 Million sperm • Interval estimate • 12 to 22 million (contains true value) • 16 to 18 million (misses true value) • CIs constructed such that 95% contain the true value.
Test Statistics • A Statistic for which the frequency of particular values is known. • Observed values can be used to test hypotheses.
Type I and Type II Errors • Type I error • occurs when we believe that there is a genuine effect in our population, when in fact there isn’t. • The probability is the α-level (usually .05) • Type II error • occurs when we believe that there is no effect in the population when, in reality, there is. • The probability is the β-level (often .2)
What does Statistical Significance Tell Us? • The importance of an effect? • No, significance depends on sample size. • That the null hypothesis is false? • No, it is always false. • That the null hypothesis is true? • No, it is never true.