Simplified Sequential Multiple Decision Procedures (SSMDP) For Genome Scans

Simplified Sequential Multiple Decision Procedures (SSMDP)For Genome Scans Q. Zhang and M.A. Province Division of BiostatisticsWashington University School of Medicine JSM, Minneapolis,August 7, 2005

Multiple Comparison Problems In Genome Scan Large number of genetic markers in genome-wide linkage scans Large number of statistical tests Inflation of type I error Test-wise and experiment-wise (genome-wide) errors Stringent α-level or corrected p-value (Bonferroni, FDR, etc. ) Balance between false positive and false negative rates New methods?  SMDP

SNP1 SNP2 SNP2 SNP3 SNP3 SNP4 SNP4 SNP5 SNP5 SNP6 SNP6 … … SNPn SNPn Independent Test Simultaneous Test Idea 2: Simultaneous Test Test 1 Test 2 Test 3 Test 4 Test 5 Test 6 … Test n SMDP divides populations into two groups and guarantees that there is a real difference between the two groups with probability P*. SNP1 Tradeoff between false positive rate and false negative rate.

Idea 1: Sequential Start from a small sample size Increase sample size one by one Do analysis at each time Stop when conclusion is reached n0+1 n0+2 … n0 n0+i Plan experiments in next stage and save resources Use residual/extra data to do validation

History of SMDP Sequential Probability Ratio Test (SPRT), Wald 1947 SMDP, Bechhoffer, Kiefer, Sobel, 1968 SMDP Haseman-Elston Method (ASP), Province 2000;

Fixed sampling regression of 5 phenotypes on genotypes of 188 SNPs

Validation 100 80 59 7 2 1 1 1 Total 73 19 50 8 FDR 0.960 0.928 0.950 0.672 FDR 0.879 0.788 0.851 0.052 SMDP for GOTGRANTS and 86 SNPs of region 3.

Abstract Firstly, five phenotypes (NPUBS, NDRIVEL, PCTDRIVEL, RIVALSIDE and GOTGRANTS) of 557 subjects were regressed on the genotypes of 188 SNPs from 3 regions (genotypes were converted to 1, 1.5 or 2). Regressions of GOTGRANTS on some SNPs of region 3 were significant. In order to get more balanced data and reduce computational time, only 86 SNPs of region3 were used in SMDP analyses. As a comparison, fixed sample regressions were also done for the 86 SNPs, using regular p value, Bonfferoni corrected p value and FDR corrected p value respectively. R3SNP69 was found significant in all replications for all the methods. If R3SNP69 is the only real significant one. The false discovery rates (the ratios of false positive findings to the total positive findings) of fixed sample regression (Reg. p, Bon. and FDR ) and SMDP are 0.960, 0.928, 0.950 and 0.672, respectively. Besides R3SNP69, R3SNP60 and R3SNP70 were also detected out by SMDP in most replications. Correlation analyses showed that the three SNPs correlate to each other significantly, which might results from the linkages between them. If R3SNP69, R3SNP60 and R3SNP70 are assumed as the real significant ones. The false discovery rates of fixed sample regression (Reg. p, Bon and FDR) and SMDP are 0.879, 0.788, 0.851 and 0.052, respectively. Obviously, SMDP possesses a far smaller false positive rate. Another advantage of SMDP is that it just needs an average sample size of 325 to make the decisions for this data containing a total sample size of 557.

Thanks !

Simplified Sequential Multiple Decision Procedures (SSMDP) For Genome Scans