310 likes | 447 Views
Chapter 11. The Chi-Square Distribution. Chapter 11 Objectives. The student will be able to Perform a Goodness of Fit hypothesis test Perform a Test of Independence hypothesis test. Chi-square distribution. Chi-square is a distribution test statistics used to determine 3 things
E N D
Chapter 11 The Chi-Square Distribution
Chapter 11 Objectives The student will be able to • Perform a Goodness of Fit hypothesis test • Perform a Test of Independence hypothesis test
Chi-square distribution • Chi-square is a distribution test statistics used to determine 3 things • Does our data fit a certain distribution? Goodness-of-fit • Are two factors independent? Test of independence • Does our variance change? Test of single variance
Chi-square distribution • Notation • new random variable ~ • µ = df2 = 2df • Facts about Chi-square • Nonsymmetrical and skewed right • value is always > zero • curve looks different for different degrees of freedom. As df gets larger curve approaches normal • df > 90 • mean is located to the right of the peak
Goodness-of-fit • Hypothesis test steps are the same as always with the following changes • Test is always a right-tailed test • Null and alternate hypothesis are in words rather than equations • degrees of freedom = number of intervals - 1 • test statistic defined as
Goodness-of-fitAn example A 6-sided die is rolled 120 times. The results are in the table below. Conduct a hypothesis test to determine if the die is fair.
Goodness-of-fitAn example • Contradictory hypotheses • Ho: observed data fits a Uniform distribution (die is fair) • Ha: observed data does not fit a Uniform distribution (die is not fair) • Determine distribution • Chi-square goodness-of-fit • right-tailed test • Perform calculations to find pvalue • enter observed into L1 • enter expected into L2
Goodness-of-fitAn example • Perform calculations (cont.) • TI83 • Access LIST, MATH, SUM • enter sum((L1 - L2)2/L2) • this is the test statistic • For our problem chi-square = 13.6 • Access DISTR and chicdf • syntax is (test stat, 199, df) • generate pvalue • For our problem pvalue = 0.0184 • Make decision • since α > 0.0184, reject null • Concluding statement • There is sufficient evidence to conclude that the observed data does not fit a uniform distribution. (The die is not fair.)
Test of Independence • Hypothesis testing steps the same with the following edit • Null and alternate in words • have a contingency table • expected values are calculated from the table • (row total)(column total) sample size • Test statistic same • df = (#columns - 1)(#row - 1) • always right-tailed test
Test of IndependenceAn example • Conduct a hypothesis test to determine whether there is a relationship between an employees performance in a company’s training program and his/her ultimate success on the job. Use a level of significance of 1%. • Ho: Performance in training and success on job are independent • Ha: Performance in training and success on job are not independent (or dependent).
Test of IndependenceAn example • Performance on job versus performance in training Performance on Job Performance in training
Test of IndependenceAn example • Determine distribution • right tailed • chi-square • Perform calculations to find pvalue • Calculator will calculated expected values. We must enter contingency table as a Matrix (ack!) • Access MATRIX and edit Matrix A • Access Chi-square test • Matrix A = observed • Matrix B calculator places expected here
Test of IndependenceAn Example • Perform calculations (cont.) • pvalue = 0.0005 • Make decision. • = 0.01 > pvalue = 0.0005 • reject null hypothesis • Concluding statement. • Performance in training and job success are dependent.
Chapter 12 Linear Regression and Correlation Chapter Objectives
Chapter 12 Objectives The student should be able to: • Discuss basic ideas of linear regression and correlation. • Create and interpret a line of best fit. • Calculate and interpret the correlation coefficient. • Find outliers.
Linear Regression • Method for finding the “best fit” line through a scatterplot of paired data • independent variable (x) versus dependent variable (y) • Recall from Algebra • equation of line y = a + bx • where a is the y-intercept • b is the slope of the line • if b>0, slope upward to right • if b<0, slope downward to right • if b=0, line is horizontal
Linear Regression • The eye-ball method • Draw what looks to you to be the best straight line fit • Pick two points on the line and find the equation of the line • The calculated method • from calculus, we find the line that minimizes the distance each point is from the line that best fits the scatterplot • letting the calculator do the work using LinRegTTest An example
The Correlation Coefficient Used to determine if the regression line is a “good fit” • ρis the population correlation coefficient • r is the sample correlation coefficient Formidable equation • see text • Calculator does the work • r positive - upward to right • r negative - downward to right • r zero - no correlation Graphs
The Correlation Coefficient Determining if there is a “good fit” • Gut method • if calculated r is close to 1 or -1, there’s a good fit • Hypothesis test (LinRegTest) • Ho: ρ = 0 Ha ρ ≠ 0 • Ho means here IS NOT a significant linear relationship(correlation) between x and y in the population. • Ha means here IS A significant linear relationship (correlation) between x and y in the population • To reject Ho means that there is a linear relationship between x and y in the population. • Does not mean that one CAUSES the other. • Comparison to critical value • Use table end of chapter • Determine degrees of freedom df= n - 2 • If r < negative critical value, then r is significant and we have a good fit • If r > positive critical value, then r is significant and we have a good fit
The Regression line as a predictor • If the line is determined to be a good fit, the equation can be used to predict y or x values from x or y values • Plug the numbers into the equation • Equation is only valid for the paired data DOMAIN
The Issue of Outliers Compare 1.9s to |y - yhat|for each (x, y) pair • if |y - yhat| > 1.9s, the point could be an outlier • LinRegTest gives us s • y – yhat is put into the RESID list when the LinRegTest is done • To see the RESID list: in calculator type 2nd, LIST, RESID (found under NAMES), 2nd, STO>,L3
Chapter 13 F Distribution and ANOVA
Chapter 13 Objectives The student should be able to: • Interpret the F distribution as the number of groups and the sample size change. • Discuss two uses for the F distribution and ANOVA. • Conduct and interpret ANOVA
Single Factor Analysis of VarianceANOVA • What is it good for? • Determines the existence of statistically significant differences among several group means. • Basic assumptions • Each population from which a sample is taken is assumed to be normal. • Each sample is randomly selected and independent. • The populations are assumed to have equal standard deviations (or variances). • The factor is the categorical variable. • The response is the numerical variable. • The Hypotheses • Ho: µ1=µ2=µ2=…=µk • Ha: At least two of the group means are not equal • Always a right-tailed test
F Distribution • Named after Sir Ronald Fisher • F statistic is a ratio (i.e. fraction) • two sets of degrees of freedom (numerator and denominator) • F ~ Fdf(num),df(denom) • Two estimates of variance are made • Variation between samples • Estimate of σ2that is the variance of the sample means • Variation due to treatment (i.e. explained variation) • Variation within samples • Estimate of σ2that is the average of the sample variances • Variations due to error (i.e. unexplained variation)
F Distribution Facts • Curve is skewed right. • Different curve for each set of degrees of freedom. • As the dfs for numerator and denominator get larger, the curve approximates the normal distribution • F statistic is greater than or equal to zero • Other uses • Comparing two variances • Two-Way Analysis of Variance
The F Statistic • Formula • MSbetween – mean square explained by the different groups • MSwithin – mean square that is due to chance • SSbetween – sum of squares that represents the variations among different samples • SSwithin – sum of squares that represents the variation within samples that is due to chance
Thank goodness for our calculator!!! • Enter the table data by columns into L1, L2, L3…. • Do ANOVA test – ANOVA(L1, L2,..) • What the calculator gives • F – the F statistics • p – the pvalue • Factor – the between stuff • df = # groups – 1 = k – 1 • SSbetween • MSbetween • Error – the within stuff • df = total number of samples – # of groups = N – k • SSwithin • MSwithin
An Example Four sororities took a random sample of sisters regarding their grade averages for the past term. The results are shown below: Using a significance level of 1%, is there a difference in grade averages among the sororities?
Review for Final Exam • What’s fair game • Chapter 1, Chapter 2., Chapter 3, Chapter 4, Chapter 5, Chapter 6, Chapter 7, Chapter 8, Chapter 9, Chapter 10, Chapter 11, Chapter 12 • 42 multiple choice questions • Do problems from each chapter • What to bring with you • Scantron (#2052), pencil, eraser, calculator, 2 sheets of notes (8.5x11 inches, both sides)
And so ends yourMath 10 experience • Prepare for the Final exam • It has been a pleasure having you in class. Good luck and Godspeed with whatever path you take in life.