Effect Size – Can the Effect Be Too Small

Effect Size – Can the Effect Be Too Small Robert J. Temple, M.D. Advisory Committee Mtg April 25, 2006

Effect Size –Can the Effect be Too Small Legal Standards FD & C Act Legislative history Court cases Reinvention statement PRO document Data presentation: mean vs distribution

Legal Standard Section 505(d) says that an application can be refused if there is a lack of substantial evidence that it will have the effect it is represented in labeling to have. This implies that a truthful description of any effect would be a basis for approval. But there are two bases for thinking that is not quite so: • Safety requirement • Warner-Lambert v Heckler

Why Might Effect Size Matter? 1. Safety FD & C Act says application can be rejected if tests of drug show it is unsafe or fail to show that it is safe. Since all drugs have adverse effects, we often say that safe can only mean that benefits outweigh risks. This implies that effect size could matter, certainly for a toxic drug. Could it also mean that an effect size too puny could be outweighed by the unknown risk any drug has? Rebuttal: 1938 law didn’t even ask for evidence of benefit so maybe “safe” did not mean B>R, whatever we now think, but only meant nothing too bad was seen.

Why Might Effect Size Matter? 2. Warner-Lambert v Heckler (1986) Basically said that not any showing of a statistically significant effect satisfies the Act. The effect shown must be clinically meaningful and not “therapeutically trivial,” rejecting the argument that any effect claimed, if supported statistically, was sufficient and that the size of the effect is irrelevant.

On the Other Hand Legislative history is clear in saying there is no relative effectiveness requirement. A new drug need not be better than, or even as good as, available therapy. “The committee believes that this provision strikes a balance between the need for governmental control to assume that new drugs are not placed on the market until they have passed the relevant tests and the need to assure that governmental control does not become so rigid that the flow of new drugs to the market, and the incentive to undergo the expense involved in preparing them for the market, become stifled.” That does not say, however, that any effect, no matter how small, is sufficient.

Reinvention StatementFR August 1, 1995 Apparently to reassure the drug development community that FDA was not imposing new comparative standards, FDA announced that “FDA weighs a product’s demonstrated effectiveness against its risks and considers other factors, such as the seriousness and outcome of the disease being treated and adequacy of existing treatments.” We do not require new drugs to be more effective than existing therapies nor do we “necessarily” require comparison with other products. Except that, “for products to treat life-threatening diseases, diseases with irreversible morbidity, and contagious diseases that pose serious health risks to others, it is essential for public health protection that a new therapy be as effective as existing approved therapies.” [Didn’t mean that, of course, as only superiority would show that; really referred to “non-inferiority”]

All in All 1. We might say an effect is clinically meaningless but more likely because of the nature of the effect (increased bile flow, suppression of gut fungus), not its size. In general, however, for a non-toxic drug and for a not serious disease we generally have not demanded an effect of a particular size, or required comparisons with other treatment. 2. Clearly could conclude that a small effect is outweighed by toxicity (rejected at least two Alzheimer’s drugs, one for severe N+V, the other for proximal weakness; many, many other examples). In these cases, the benefit did not outweigh risk. 3. Would consider available Rx. For toxic drug could 1) demand superiority 2) demand effect in non-responders (Clozapine, bepridil). In these cases would need comparative data.

All in All (cont) 4. Demand comparative data when disease is serious and there is existing Rx. Indeed, comparative (NI) is all you can do. In those cases, we regularly do insist on preserving some fraction of effect (usually 50%), indicating that loss of too much of a valuable effect could not be acceptable.

The PRO Document A new effect size standard? Mainly because of a concern that PRO methods are “too sensitive,” “it is important to consider whether the [detected] changes are meaningful.” It therefore calls for specifying a “minimum important difference (MD) as “a benchmark for interpreting mean differences.” The most explicit statement I know of that a statistically significant effect on a valid measure might not be accepted as evidence of effectiveness. Do we really mean it? Are we going to begin to say “not good enough.” And why only for PROs? Growing sample sizes raise similar issues.

One Last Point 1. Mean vs Distribution We tend to look at mean effects, but individuals will have a range of effects, some larger, some smaller. Might we be missing meaningful effects in a fraction of the patients by focusing on the mean? Should we more often show both mean and distribution. The latter will show, for a drug with effect on mean, that there are also more people with an effect of any given size. 2. What is effect size, anyway? Point estimate is not effect size. If we’re serious about MID, we’d need to revise the null hypothesis from Ho: T ≤ O to Ho T ≤ MID Ha: T > O to Ha T > MID Where the 97/2% (one sided) lower bound would be the test.

A Few Points 3. Do we really want to specify a minimum mean effect (or minimum difference on some dichotomous measure)? Of course, the biggest question is how we would support any particular value?

Effect Size – Can the Effect Be Too Small