John W. Tukey’s Multiple Contributions to Statistics at Merck

John W. Tukey’s Multiple Contributions to Statistics at Merck Joseph F. Heyse Merck Research Laboratories Third International Conference on Multiple Comparisons Bethesda, Maryland August 5, 2002

Overview • Professor John W. Tukey began consulting with Merck Sharp and Dohme Research Laboratories in 1953 and continued until 2000. • Prior to 1953 John was a consultant to Merck in the area of manufacturing. • Through the years John made major contributions to the statistical aspects of all major research disciplines • His consultations led to the establishment of Merck and industry standards for several statistical approaches

Areas of Involvement • Safety assessment • Clinical trials • Laboratory quality control • Clinical safety analyses • Health economics • Gene expression and microarray data • Use of graphics

Agenda for June 1, 2000 Meeting 1. Multiple comparisons: Applications of the False Discovery Rate to Vaccine Adverse Experience Data 2. Transformations for analyzing parasite count data with many zero counts 3. Use of TaqMan assay for gene expression 4. Error models for microarray data

Examples • Trend testing in safety assessment • Adjusting for multiplicity in rodent carcinogenicity studies • Multiplicity applied to estimated variances

Trend Test for Dose Response(Tukey et al., 1985) • Trend defined as progressiveness of response with increasing dose • Three sets of carriers for the candidate set • Arithmetic • Ordinal • Arithmetic-Logarithmic • Statistical assessment for trend is taken as most extreme P-value computed from candidate set • NOSTASOT Dose - No Statistical Significance of Trend Dose - Highest dose through which test for trend is N.S.

Properties of Trend Test • Trend test inflates P-values slightly in conservative direction for safety assessment • Adjusted trend test reported by Capizzi et al. (1992) favorable to other tests against ordered alternative hypothesis • NOSTASOT is closed sequential procedure • Tukey et al. also proposed an adjustment procedure for multiple safety assessment parameters with unknown correlation

Dose of drug in mg/kg/day Control 12 3.96 0.044 0.25 6 3.95 0.014 1.0 6 4.08 0.080 4.0 6 4.21 0.024 16.0 6 4.18 0.027 Number of days Mean albumin Sample variance Example Summary statistics for toxicity study in dogs s2 = 0.039 with 31 d.f.

Carrier Arithmetic Ordinal Arithmetic-Logarithmic P-value 0.036 0.006 0.006 Trend Test P=0.006 Example Trend Test Results

Trend Analysis All Dose Groups (CD4) 4 Dose Groups (CD3) 3 Dose Groups (CD2) P-value 0.006 0.011 0.291 Example NOSTASOT Analysis NOSTASOT dose is D2 = 1.0 mg/kg/day

Separate Dose Analyses C vs. D1 C vs. D2 C vs. D3 C vs. D4 P-value 0.893 0.259 0.018 0.041 S-P = 0.066 Example Adjusted P-value for Dunnett’s procedure

Multiple Significance Testing in Rodent Carcinogenicity Experiments • Mantel (1980) credits Tukey with proposal to adjust multiple P-values in carcinogenicity experiments where P1 is the smallest observed P-value, k1 is the number of tumor types that could have attained P1 • These methods have been improved by several authors and now are commonly applied

Grouping Based on Estimated Variances • The naïve procedure of weighting the results of different experiments inversely to their estimated variance is unsatisfactory • Cochran (1954) introduced the idea of partial weighting in which ½ to ²/3of the studies that appear less variable are assigned equal weight • Mosteller and Tukey (1984) treated the more realistic case with the possible presence of interaction • Ciminera et al. (1993) applied those methods in the multicenter clinical trial setting.

Estimated Variance 0.0156 0.0167 0.0531 0.0542 0.0594 0.0609 0.0776 0.0859 0.0878 0.0879 Estimated Variance 0.0903 0.0909 0.0958 0.1038 0.1046 0.1124 0.1584 0.3188 0.3461 Center 7 1 5 4 15 3 13 17 12 2 d.f. 5 11 10 8 11 14 13 9 10 8 Center 6 14 8 16 11 19 10 9 18 d.f. 10 4 8 3 4 14 8 6 8 Groupings algorithm based on the medians of order statistics using the Wilson-Hilferty (1931) approximation. Grouping of Centers Based on Estimated Variances(Ciminera et al., 1994)

Insights on Statistics • Randomization is the only thing you can safely assume when analyzing clinical trial data • There is no such thing as a null effect • There is always interaction • Having only two points is the only time you should pretend that you have a linear relationship; and in these cases, you should get more data

Insights on Character “The best thing about being a statistician is that you get to play in other people’s backyards.” (J.W.T.) • Remember that you are a guest and need to bring your manners and respect. • It’s all about relationships.

What would John think of these remarks? Thank you for the kind words, but . . . you could have said them using fewer slides.

References • Capizzi T, Survill TT, Heyse JF, and Malani H: An empirical and simulated comparison of some tests for detecting progressiveness of response with increasing doses of a compound. Biometrical Journal, 34:275-289, 1992. • Ciminera JL, Heyse JF, Nguyen HH, and Tukey JW: Evaluation of multicentre clinical trial data using adaptations of the Mosteller-Tukey procedure. Statistics in Medicine, 12:1047-1061, 1993. • Ciminera JL, Heyse JF, Nguyen HH, and Tukey JW: Tests for qualitative treatment-by-centre interaction using a “pushback” procedure. Statistics in Medicine, 12:1033-1045, 1993.

References (cont.) • Cox JL, Heyse JF, and Tukey JW: Efficacy estimates from parasite count data that include zero counts. Experimental Parasitology, 96:1-8, 2000. • Heyse JF and Rom D: Adjusting for multiplicity of statistical tests in the analysis of carcinogenicity studies. Biometrical Journal, 30:883-896, 1988. • Mantel N: Assessing laboratory evidence for neoplastic activity. Biometrics, 36:381-399, 1980. • Mantel N, Tukey JW, Ciminera JL, and Heyse JF: Tumorigenicity assays, including use of the jackknife. Biometrical Journal, 24:579-596, 1982.

References (cont.) • Tukey JW, Ciminera JL, and Heyse JF: Testing the statistical certainty of a response to increasing doses of a drug. Biometrics, 41:295-301, 1985.

John W. Tukey’s Multiple Contributions to Statistics at Merck