Hypothesis Testing

Hypothesis Testing Introduction to Study Skills & Research Methods (HL10040) Dr James Betts

Lecture Outline: • What is Hypothesis Testing? • Hypothesis Formulation • Statistical Errors • Effect of Study Design • Test Procedures • Test Selection.

Statistics Descriptive Inferential Correlational Organising, summarising & describing data Generalising Relationships Significance

Sampling Error Effective sampling is essential to correctly generalise back to our target population Statistics The dependent variable can be generalised from n to N

What is Hypothesis Testing? Null Hypothesis Alternative Hypothesis • A  B • A = B We also need to establish: 1) How unequal are these observations? 2) Are these observations reflective of the general population?

Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Example Hypotheses: Isometric Torque Null Hypothesis Alternative Hypothesis • ♂ ♀ • ♂= ♀

Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Example Hypotheses: Isometric Torque Alternative Hypothesis (HA) or experimental (HE) There is a significant difference in the DV between males and females. n.b. these are 2-tailed hypotheses. Most common and more recommended. Null Hypothesis (H0) There is not a significant difference in the DV between males and females

Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Example Hypotheses: Isometric Torque Useful analogy- the criminal trial Imagine you are the prosecutor H0= Defendant not guilty HA= Defendant guilty Your job is to provide sufficient evidence (i.e. ‘beyond reasonable doubt’) that the defendant is not innocent. Remember: the p-value does NOT tell us the probability they are innocent but rather the probability of finding our evidence assuming they are innocent

Example Hypotheses: Isometric Torque • Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? N♀ n.b. This is why effective sampling is so important... N♂ n♀ n♂ 16 17 18 19 20 Sustained Isometric Torque (seconds)

Example Hypotheses: Isometric Torque • Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? N♀ …poor/insufficient sampling can lead to errors… N♂ n♀ n♂ 16 17 18 19 20 Sustained Isometric Torque (seconds)

Type 1 Errors -Rejecting H0 when it is actually true -Concluding a difference when one does not actually exist Type 2 Errors -Accepting H0 when it is actually false (e.g. previous slide) -Concluding no difference when one does exist Statistical Errors Errors can occur due to biased/inadequate sampling, poor experimental design or the use of inappropriate/non-parametric tests.

Independent Measures Individual scores in each data set are independent of one another Repeated Measures Individual scores in each data set are dependent/paired/correlated Back to Study Design

Independent Measures Individual scores in each data set are independent of one another Repeated Measures Individual scores in each data set are dependent/paired/correlated T O1 T O2 O1 P Oa PLACEBO Back to Study Design 2 Distinct Groups Pre-Experimental designs. Same individuals tested twice

Independent Measures Individual scores in each data set are independent of one another Repeated Measures O1 T O2 Random Group Assignment R O3 P O4 Cross-Over Design PLACEBO Back to Study Design True-Experimental design. Depends on how equivalent groups were achieved

Example Hypotheses: Isometric Torque • Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? • So the above example is an measures design • Which therefore requires an independent t-test. Independent AKA Students’ (Gosset’s) t-test

Independent t-test: Calculation Is this a significant effect? n♀ n♂ 16 17 18 19 20 Sustained Isometric Torque (seconds)

Independent t-test: Calculation Step 1: Calculate the Standard Error for Each Mean SEM♀ = SD/√n = 1.74/5 = 0.348 SEM♂ = SD/√n = 1.72/5 = 0.344

Independent t-test: Calculation Step 2: Calculate the Standard Error for the difference in means SEMdiff = √ SEM♀2 + SEM♂2= √ 0.251 = 0.501

Independent t-test: Calculation Step 3: Calculate the t statistic t = (Mean♀ - Mean♂) / SEMdiff = 2.00

Independent t-test: Calculation Step 4: Calculate the degrees of freedom (df) df = (n♀ - 1) + (n♂ - 1) = 48

Independent t-test: Calculation Step 5: Determine the critical value for t using a t-distribution table n.b. Use 0.05 for 2 tailed test Degrees of Freedom Critical t-ratio 44 46 48 50 2.015 2.013 2.011 2.009

Independent t-test: Calculation Step 6 finished: Compare t calculated with t critical Calculated t = 2.00 Critical t = 2.01 Therefore, t calculated < t critical Effect size n.s.

Independent t-test: Calculation Interpretation: P > 0.05 Reject HA & Accept HO Conclusion: There is not a significant difference in the DV between males and females.

Independent t-test: Calculation Evaluation: The wealth of available literature supports that females can sustain isometric contractions longer than males. This may suggest that the findings of the present study represent a type error Possible solution: Increase n

Independent t-test: SPSS Output Swim Data from SPSS session 8 Ignore sign 2.333 > 2.101 So P < 0.05 Calculated t df 18 = critical t 2.101

As shown earlier, a repeated measures design infers that data in each data set can be paired or correlated with one another An independent t-test is inappropriate to analyse such data Instead, a paired t-testshould be used… Repeated Measures Designs

Advantages of using Paired Data • Data from independent samples is heavily influenced by variance between subjects i.e. This data would have a large SD associated with an independent t-test simply because some subjects performed better than others HOWEVER… Large SD (variance)

Advantages of using Paired Data • Data from independent samples is heavily influenced by variance between subjects …using the same participants on two occasions allows us to pair up the data… …now we can remove between subject variance from subsequent analysis…

Paired t-test: Calculation ∑D = ∑D2 = Steps 1 & 2: Complete this table

Paired t-test: Calculation Step 3: Calculate the t statistic ∑D t = n x ∑D2 – (∑D)2 = √ (n - 1) ∑D = ∑D2 =

Paired t-test: Calculation Step 3: Calculate the t statistic 31 t = 8 x 137– (31)2 = 7.06 √ 7 ∑D = ∑D2 =

Paired t-test: Calculation Steps 4 & 5: Calculate the df and use a t-distribution table to find t critical Critical t-ratio (0.05 level) Critical t-ratio (0.01 level) Degrees of Freedom 63.657 9.925 5.841 4.604 4.032 3.707 3.499 3.355 3.250 1 2 3 4 5 6 7 8 9 12.71 4.303 3.182 2.776 2.571 2.447 2.365 2.306 2.262 n -1 df =

Paired t-test: Calculation Step 6 finished: Compare t calculated with t critical Calculated t = 7.06 Critical t = 3.499 Therefore, t calculated > t critical Effect size sig.

Paired t-test: Calculation Interpretation: P < 0.05 Reject H0 & Accept HA Conclusion: There is a significant difference in the DV between week 1 and week 2.

Paired t-test: SPSS Output Push-up Data from lecture 3 Calculated t Ignore sign 7.059 > 3.499 So P < 0.01 df 7 = critical t 2.365 (0.05) 3.499 (0.01)

Both the t-tests just shown are parametric tests These examine for differences in the mean Therefore the mean must be an accurate descriptor ? Normal Non-normal Parametric versus Non-Parametric

Example Hypotheses: Isometric Torque • Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? Normal Distribution mean is appropriate t-test Mean A Mean B 16 17 18 19 20 Sustained Isometric Torque (seconds)

Example Hypotheses: Isometric Torque • Is there any difference in the length of time that males and females can sustain an isometric muscular contraction? NON-Normal Distribution mean is INappropriate Type 2 error Mean A Mean B 16 17 18 19 20 Sustained Isometric Torque (seconds)

All means and paired differences are ND (this is the main consideration) N acquired through random sampling Data must be of at least the interval LOM Data must be Continuous. …assumptions of parametric analyses …but see Norman (2010) Adv. Health Sci. Educ.

These tests use the median and do not assume anything about distribution, i.e. ‘distribution free’ Mathematically, value is ignored (i.e. the magnitude of differences are not compared) Instead, data is analysed simply according to rank. Non-Parametric Tests

Independent Measures Mann-Whitney Test Repeated Measures Wilcoxon Test Non-Parametric Tests e.g. Exam grades (ordinal) from 14 students in 2 separate schools

Mann-Whitney U: Calculation Step 1: Rank all the data from both groups in one series, then total each School A School B Student Student Grade Grade Rank Rank B- B- A+ D- B+ A- F D C+ C+ B- E C- A- J. S. L. D. H. L. M. J. T. M. T. S. P. H. T. J. M. M. K. S. P. S. R. M. P. W. A. F. ∑RB= ∑RA= Median = C+; Median = B-;

Mann-Whitney U: Calculation Step 2: Calculate two versions of the U statistic using: U1 = (nAxnB) + (nA + 1) xnA - ∑RA 2 AND… U2 = (nAxnB) + (nB + 1) xnB - ∑RB 2 ∑RB= ∑RA= Median = C+; Median = B-;

Mann-Whitney U: Calculation Step 2: Calculate two versions of the U statistic using: U1 = (nAxnB) + (nA + 1) xnA - ∑RA 2 …OR to save time you can calculate U1 and then U2 as follows U2 = (nAxnB) - U1 ∑RB= ∑RA= Median = C+; Median = B-;

Mann-Whitney U: Calculation Step 3 finished: Select the smaller of the two U statistics (U1 = 17.5; U2 = 31.5) …now consult a table of critical values for the Mann-Whitney test n 0.05 0.01 6 5 2 7 8 4 8 13 7 9 17 11 Conclusion Median A = Median B Calculated U must be less than critical U to conclude a significant difference

Mann-Whitney U: SPSS Output Calculated U (lower value) 17.5 > 8 So P > 0.05 n.s.

Independent Measures Mann-Whitney Test Repeated Measures Wilcoxon Test Non-Parametric Tests e.g. One group pre-test post-test, assumed non-normal

Wilcoxon Signed Ranks: Calculation Step 1: Rank all the differences in one series (ignoring signs), then total each Pre-training OBLA (kph) Post-training OBLA (kph) Athlete Diff. Rank Signed Ranks - + J. S. L. D. H. L. M. J. T. M. T. S. P. H. 15.6 17.2 17.7 16.5 15.9 16.7 17.0 16.1 17.5 16.7 16.8 16.0 16.5 17.1 0.5 0.3 -1 0.3 0.1 -0.2 0.1 6 4.5 -7 4.5 1.5 -3 1.5 6 4.5 4.5 1.5 1.5 -7 -3 Medians =16.7 16.7 ∑Signed Ranks =

Wilcoxon Signed Ranks: Calculation Step 2: The smaller of the T values is our test statistic (T+ = 18; T- = 10) …now consult a table of critical values for the Wilcoxon test n 0.05 6 0 7 2 8 3 9 5 Conclusion Median A = Median B Calculated T must be less than critical T to conclude a significant difference

Wilcoxon Signed Ranks: SPSS Output 10 > 2 So P > 0.05 n.s.

Hypothesis Testing