Complex Adaptive Systems and Human Health: Statistical Approaches in Pharmacogenomics

Complex Adaptive Systems and Human Health: Statistical Approaches in Pharmacogenomics Kim E. Zerba, Ph.D. Bristol-Myers Squibb FDA/Industry Statistics Workshop Statistics: From Theory to Regulatory Acceptance 18-19 September 2003 Bethesda, Maryland Disclaimer The views presented are my own and do not necessarily represent those of Bristol-Myers Squibb

Outline • Complex Adaptive Systems and Human Health • Approach and Some Key Statistical Issues with Genetic Polymorphisms in Pharmacogenomics • Where Do We Go from Here?

Protein Phenotype Disease RNA Gene Gene DNA The Genetic Paradigm

Non-Infectious Human Disease Load Complex, Multifactorial > 98% Simple, Monogenic < 2%

UNIQUE (Initial Conditions) Complex Adaptive Systems and Human Health • Each individual is a complex • adaptive system and the • fundamental unit of organization UNIQUE FUTURE ENVIRONMENTAL NORM OF HISTORY REACTION Risk of Disease - INDIVIDUAL • Health or disease is • an emergent feature based on • interactions among many • agents, including genes and • environments PHYSIOLOGICAL FITNESS, HEALTH NOW + TIME-SPACE CONTINUUM • Agents participate in dynamic • network and are notdirectcauses BLOOD HAEMOSTASIS PRESSURE • Network organized hierarchically • and heterarchically into fields REGULATION • Fields are domains of relational • order among agents LIPID CARBOHYDRATE METABOLISM METABOLISM • Stronger relationships within fields, • weaker relationships among fields GENOME TYPE • Unique genome type provides initial • conditions and capacity for change • Context and time are key to understanding influence of genetic variation See : Zerba and Sing, 1993, Current Opinion in Lipidology 4: 152-162, Zerba et al. 2000, Human Genetics 107: 466-475 for more detail

? 2 Complex Adaptive Systems Approach to PGx Endpoints ? 1 Biomarkers 3 ? Genes

Some Key Statistical Issues for Pharmacogenomics Studies Using Genetic Polymorphisms • Gene/Polymorphism Selection • Linkage Disequilibrium • Admixture and Population Stratification • Invariance • Context Dependence • Time

Gene/Polymorphism Selection • Genome Scan • Genes not identified a priori • Genotyping • 25K - 500K polymorphisms genotyped • foreachsubject (not practical yet) • DNA Pooling • 25K - 1.5 million polymorphisms • Case-control allele frequency differences for each polymorphism • Candidate Genes

P F Candidate Genes Candidate Gene Region F Si Unknown and unmeasured functional polymorphism One of numerous non-functional polymorphisms • Assume that any association of Si with phenotype, P, is because of linkage disequilibrium between F and Si PFS = pFpS + DFS

SNP + - Admixture Population I Population II + + + - - + - - - + + - + - + - + + - - + p+ = 0.8 p+ = 0.2 + - - + - Admixed Population + - - + a = proportion of population I = 0.5 p+ = 0.5

Consider two subpopulations, I and II: For each subpopulation, there is linkage equilibrium between a disease allele, F, and a marker allele, S, PFISI = pFIpSI; PFIISII = pFIIpSII; DFISI = DFIISII = 0. In the admixed population (I + II), there is linkage disequilibrium between F and S, PFS = pFpS + a(1-a)(pFI - pFII)(pSI - pSII) Admixture Linkage Disequilibrium Marker Allele Frequency Difference Subpopulations Proportions Disease Allele Frequency Difference

Admixture and Population Stratification • Admixture linkage disequilibrium dissipates quickly in a randomly mating population • Common clinical trial feature: > 1 ethnic group • Population stratification • Ethnicity is a confounder • Population stratification can create linkage disequilibrium just like admixture only spurious • Type I or Type II error inflation

False-Positive Endpoint Association Example Not considered in analysis • Unbalanced design • Unequal numbers of each group: aI = 0.67 • Marker allele: p+ = 0.8 in ethnic group I • p+ = 0.2 in ethnic group II • Disease risk: pF = 0.8 for ethnic group I • pF = 0.2 for ethnic group II

Population Genetic Structure and the Search for Functional Mutations: Quantitative Traits FREQUENCY Aa AA aa Phenotype (Biomarker) ? SCALE ? Genotype Functional Mutation? SNP FREQUENCYandSCALEcontribute to inferences about SNP-phenotype associations: Analysis of Variance Approach SSR = fi(Yi - Y)2

Population Stratification and Genotype Frequencies A a SNP Aa • Stratification can result in decreased heterozygote frequencies relative to expectation: PAa = 2pApa - 2DA (DA positive in example) PAA Paa Ethnic Group II Ethnic Group I Average Genotype Frequencies PAa aa AA pa pA

DA --> + • Population stratification can result in overestimation of quantitative phenotypic variation associated with genetic variation relative to Hardy-Weinberg equilibrium expectation + 0 Sum of Squares Bias - - + 0 DA

e2 e3 e4 AA 112 Cys Cys Arg AA 158 Cys Arg Arg SNP SNP Invariance, Context and Time An example from Apolipoprotein E Biology • Molecular weight: 34 kD • Synthesized in most organs • liver, brain, gonads, kidney, spleen, muscle • Key physiological role in lipid transport • ligand for the LDL (ApoB-E) receptor • Structural gene on chromosome 19 • polymorphic with three common alleles 5’ 3’ Note: combination of SNPs involved

Invariance Nancy, France N = 223 Rochester, MN, USA N=226 Quebec, Canada N = 201 Munster, Germany N = 1000 Helsinki, Finland N=207 Alleles 2 3 4 Cholesterol (mg/dL) From Sing et al. (1996) Genetic architecture of common multifactorial diseases, pp. 211-232 In:Chadwick and Cardew (eds.) Variation in the human genome, Ciba Foundation Symposium 197, John Wiley & Sons, New York

Context and Time 2 A Changes in ApoE Additive Genetic Variance with Age Rochester, MN Males, N=1035 Bootstrap Significance Tests 16 70 60 -4 12 50 8 40 Age Window Midpoint Variance x 10 (years) 30 4 20 0 10 10 20 30 40 50 60 70 10 20 30 40 50 60 70 Age Window Midpoint (years) + 0.05 > P <0.10 P < 0.05 From Zerba et al. 1996, Genetics 143: 463-478.

Where Do We Go From Here? Some Additional Statistical Challenges • Study design in genetic setting • Genetic stratification • Genomic control • Ascertainment bias correction in choice of which polymorphisms to study • Contexts/Interactions-- which ones are important? • New analytical methods needed • Combinations of SNPs within and among genes and environments may be involved • Haplotype Reconstruction • Combinatorial Partitioning • Missing genotypes for individual polymorphisms • Sampling vs technical variability in DNA pooling studies • Multiplicity-- p-value adjustment not a trivial problem

Complex Adaptive Systems and Human Health: Statistical Approaches in Pharmacogenomics