Statistical considerations

Statistical considerations Alfredo García – Arieta, PhD Training workshop: Training of BE assessors, Kiev, October 2009

Outline • Basic statistical concepts on equivalence • How to perform the statistical analysis of a 2x2 cross-over bioequivalence study • How to calculate the sample size of a 2x2 cross-over bioequivalence study • How to calculate the CV based on the 90% CI of a BE study

Basic statistical concepts

Type of studies • Superiority studies • A is better than B (A = active and B = placebo or gold-standard) • Conventional one-sided hypothesis test • Equivalence studies • A is more or less like B (A = active and B = standard) • Two-sided interval hypothesis • Non-inferiority studies • A is not worse than B (A = active and B = standard with adverse effects) • One-sided interval hypothesis

Hypothesis test • Conventional hypothesis test • H0:  = 1 H1:   1 (in this case it is two-sided) • If P<0,05 we can conclude that statistical significant difference exists • If P≥0,05 we cannot conclude • With the available potency we cannot detect a difference • But it does not mean that the difference does not exist • And it does not mean that they are equivalent or equal • We only have certainty when we reject the null hypothesis • In superiority trials: H1 is for existence of differences • This conventional test is inadequate to conclude about “equalities” • In fact, it is impossible to conclude “equality”

Null vs. Alternative hypothesis • Fisher, R.A. The Design of Experiments, Oliver and Boyd, London, 1935 • “The null hypothesis is never proved or established, but is possibly disproved in the course of experimentation. Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis” • Frequent mistake: The absence of statistical significance has been interpreted incorrectly as absence of clinically relevant differences.

Equivalence • We are interested in verifying (instead of rejecting) the null hypothesis of a conventional hypothesis test • We have to redefine the alternative hypothesis as a range of values with an equivalent effect • The differences within this range are considered clinically irrelevant • Problem: it is very difficult to define the maximum difference without clinical relevance for the Cmax and AUC of each drug • Solution: 20% based on a survey among physicians

Interval hypothesis or two one-sided tests • Redefine the null hypothesis: How? • Solution: It is like changing the null to the alternative hypothesis and vice versa. • Alternative hypothesis test: Schuirmann, 1981 • H01:  1 Ha1: 1< • H02:  2 Ha2: < 2. • This is equivalent to: • H0:  1 or  2 Ha: 1<<2 • It is called as an interval hypothesis because the equivalence hypothesis is in the alternative hypothesis and it is expressed as an interval

Interval hypothesis or two one-sided tests • The new alternative hypothesis is decided with a statistic that follows a distribution that can be approximated to a t-distribution • To conclude bioequivalence a P value <0.05 has to be obtained in both one-sided tests • The hypothesis tests do not give an idea of magnitude of equivalence (P<0001 vs. 90% CI: 0.95 – 1.05). • That is why confidence intervals are preferred

Point estimate of the difference d < 0 Negative effect d = 0 No difference If T=R, d=T-R=0 If T>R, d=T-R>0 If T<R, d=T-R<0 d > 0 Positive effect

Estimation with confidence intervals in a superiority trial d < 0 Negative effect d = 0 No difference It is not statistically significant! Because the CI includes the d=0 value Confidence interval 90% - 95% d > 0 Positive effect

Estimation with confidence intervals in a superiority trial d < 0 Negative effect d = 0 No difference It is statistically significant! Because the CI does not includes the d=0 value Confidence interval 90% - 95% d > 0 Positive effect

Estimation with confidence intervals in a superiority trial d < 0 Negative effect d = 0 No difference It is statistically significant with P=0.05 Because the boundary of the CI touches the d=0 value Confidence interval 90% - 95% d > 0 Positive effect

Equivalence study -d +d d < 0 Negative effect d = 0 No difference Region of clinical equivalence d > 0 Positive effect

Equivalence vs. difference -d +d d < 0 Negative effect d = 0 No difference Region of clinical equivalence Equivalent? Different? ? ? No Yes Yes ? Yes Yes ? Yes Yes Yes Yes ? Yes No d > 0 Positive effect

Non-inferiority study d < 0 Negative effect d = 0 No difference Inferiority limit Inferior? ? Yes ? No No No No No -d d > 0 Positive effect

Superiority study (?) d < 0 Negative effect d = 0 No difference Superiority limit Superior? ? No No No No, not clinically and ? statistically No, not clinically, but yes statistically ?, but yes statistically Yes, statistical & clinically Yes, but only the point estimate +d d > 0 Positive effect

How to perform the statistical analysis of a 2x2 cross-over bioequivalence study

Statistical Analysis of BE studies • Sponsors have to use validated software • E.g. SAS, SPSS, Winnonlin, etc. • In the past, it was possible to find statistical analyses performed with incorrect software. • Calculations based on arithmetic means, instead of Least Square Means, give biased results in unbalanced studies • Unbalance: different number of subjects in each sequence • Calculations for replicate designs are more complex and prone to mistakes

The statistical analysis is not so complex

We don’t need to calculate an ANOVA table

With complex formulae

More complex formulae

And really complex formulae

Given the following data, it is simple

First, log-transform the data

Second, calculate the arithmetic mean of each period and sequence

Note the difference between Arithmetic Mean and Least Square Mean • The arithmetic mean (AM) of T (or R) is the mean of all observations with T (or R) irrespective of its group or sequence • All observations have the same weight • The LSM of T (or R) is the mean of the two sequence by period means • In case of balanced studies AM = LSM • In case of unbalanced studies observations in sequences with less subjects have more weight • In case of a large unbalance between sequences due to drop-outs or withdrawals the bias of the AM is notable

Third, calculate the LSM of T and R B = 4.2898 A = 4.3018

Fourth, calculate the point estimate • F = LSM Test (A) – LSM Reference (B) • F = 4.30183 – 4.28985 = 0.01198 • Fifth step! Back-transform to the original scale • Point estimate = eF = e0.01198 = 1.01205 • Five very simple steps to calculate the point estimate!!!

Now we need to calculate the variability! • Step 1: Calculate the difference between periods for each subject and divide it by 2: (P2-P1)/2 • Step 2: Calculate the mean of these differences within each sequence to obtain 2 means: d1 and d2 • Step 3:Calculate the difference between “the difference in each subject” and “its corresponding sequence mean”. And square it. • Step 4: Sum these squared differences • Step 5: Divide it by (n1+n2-2), where n1 and n2 is the number of subjects in each sequence. In this example 6+6-2 = 10 • This value multiplied by 2 is the MSE • CV (%) = 100 x √eMSE-1

This can be done easily in a spreadsheet!

Step 1: Calculate the difference between periods for each subject and divide it by 2: (P2-P1)/2

Step 2: Calculate the mean of these differences within each sequence to obtain 2 means: d1 & d2

Step 3: Squared differences

Step 4: Sum these squared differences

Step 5: Divide the sum by n1+n2-2

Calculate the confidence interval withpoint estimate and variability • Step 11: In log-scale • 90% CI: F ± t(0.1, n1+n2-2)-√((Sigma2(d) x (1/n1+1/n2)) • F has been calculated before • The t value is obtained in t-Studient tables with 0,1 alpha and n1+n2-2 degrees of freedom • Or in MS Excel with the formula =DISTR.T.INV(0.1; n1+n2-2) • Sigma2(d) has been calculated before.

Final calculation: the 90% CI • Log-scale 90% CI: F±t(0.1, n1+n2-2)-√((Sigma2(d)·(1/n1+1/n2)) • F = 0.01198 • t(0.1, n1+n2-2) = 1.8124611 • Sigma2(d) = 0.02311406 • 90% CI: LL = -0.14711 to UL= 0,17107 • Step 12: Back transform the limits with eLL and eUL • eLL = e-0.14711 = 0.8632 and eUL = e0.17107 = 1.1866

How to calculate the sample size of a 2x2 cross-over bioequivalence study

Reasons for a correct calculation of the sample size • Too many subjects • It is unethical to disturb more subjects than necessary • Some subjects at risk and they are not necessary • It is an unnecessary waste of some resources ($) • Too few subjects • A study unable to reach its objective is unethical • All subjects at risk for nothing • All resources ($) is wasted when the study is inconclusive

Frequent mistakes • To calculate the sample size required to detect a 20% difference assuming that treatments are e.g. equal • Pocock, Clinical Trials, 1983 • To use calculation based on data without log-transformation • Design and Analysis of Bioavailability and Bioequivalence Studies, Chow & Liu, 1992 (1st edition) and 2000 (2nd edition) • Too many extra subjects. Usually no need of more than 10%. Depends on tolerability • 10% proposed by Patterson et al, Eur J Clin Pharmacol 57: 663-670 (2001)

Methods to calculate the sample size • Exact value has to be obtained with power curves • Approximate values are obtained based on formulae • Best approximation: iterative process (t-test) • Acceptable approximation: based on Normal distribution • Calculations are different when we assume products are really equal and when we assume products are slightly different • Any minor deviation is masked by extra subjects to be included to compensate drop-outs and withdrawals (10%)

Calculation assuming thattreatments are equal • Z(1-(b/2)) = DISTR.NORM.ESTAND.INV(0.05) for 90% 1-b • Z(1-(b/2)) = DISTR.NORM.ESTAND.INV(0.1) for 80% 1-b • Z(1-a) = DISTR.NORM.ESTAND.INV(0.05) for 5% a CV expressed as 0.3 for 30%

Example of calculation assuming thattreatments are equal • If we desire a 80% power, Z(1-(b/2)) = -1.281551566 • Consumer risk always 5%, Z(1-a) = -1.644853627 • The equation becomes: N = 343.977655 x S2 • Given a CV of 30%, S2 = 0,086177696 • Then N = 29,64 • We have to round up to the next pair number: 30 • Plus e.g. 4 extra subject in case of drop-outs

Example of calculation assuming thattreatments are equal • If we desire a 90% power, Z(1-(b/2)) = -1.644853627 • Consumer risk always 5%, Z(1-a) = -1.644853627 • The equation becomes: N = 434.686167 x S2 • Given a CV of 25%, S2 = 0,06062462 • Then N = 26,35 • We have to round up to the next pair number: 28 • Plus e.g. 4 extra subject in case of drop-outs

Calculation assuming thattreatments are not equal • Z(1-b) = DISTR.NORM.ESTAND.INV(0.1) for 90% 1-b • Z(1-b) = DISTR.NORM.ESTAND.INV(0.2) for 80% 1-b • Z(1-a) = DISTR.NORM.ESTAND.INV(0.05) for 5% a

Example of calculation assuming thattreatments are 5% different • If we desire a 90% power, Z(1-b) = -1.28155157 • Consumer risk always 5%, Z(1-a) = -1.644853627 • If we assume that mT/mR=1.05 • The equation becomes: N = 563.427623 x S2 • Given a CV of 40 %, S2 = 0,14842001 • Then N = 83.62 • We have to round up to the next pair number: 84 • Plus e.g. 8 extra subject in case of drop-outs

Statistical considerations