550 likes | 566 Views
Explore the application of old statistical methods in new and interesting contexts. Discover how genomics meets sample surveys, the theory behind bootstrapping and rank statistics, and the application of cancer genetics in stochastic geometry.
E N D
“If I have seen further it is by standing on the shoulders of giants” - Isaac Newton
You are dealing with a statistical problem in a special context. You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context. One form of the past effect - -
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application) Three vignettes
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)
John Tukey V1: Genomics meets sample surveys Context Second-order gene-set enrichment analysis Buried treasure J.W. Tukey, 1950, Some sampling simplified. J. Amer. Statist. Assoc., 45, 501-519.
Context D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor, TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007). Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619. MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007). Second order enrichment analysis of microarray expression data reveals gene sets with heterogeneous activation states. Submitted.
Context D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor, TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007). Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619. MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007). Second order enrichment analysis of microarray expression data reveals gene sets with heterogeneous activation states. Submitted.
genes (a few) HPV - tissue samples HPV + Slice of expression data from Pyeon et al. 2007
Fold changes between HPV+ and HPV- (all genes) density -2 -1 0 1 2 log2 [ HPV+ / HPV- ]
The post-processing problem + expression exogenous results biology
Exogenous biology B = { c: c = {genes with specific property } } e.g. - gene ontology (GO) - Kyoto Encylopedia (KEGG)
In HPV example, cell cycle may be an interesting gene set Excess differential expression in both directions Large sample variance (largest in KEGG, GO)
Expression results: Gene set: B Gene set variance: Standardized statistic:
Connection: C indexes a simple random sample of genes I.e. finite population sampling Centering: ?? Scaling:
We get: following Tukey’s 1950 calculation involving “K” functions: set-level statistics whose expected value equals the same statistic computed on the whole population
b1 b2 1 0 -3 1 -4 0 12 -2 -6 1 where
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)
V2: Bootstrapping and rank statistics Context Mason and Newton, 1992, A rank statistics approach to the Consistency of a general bootstrap. Ann. Statist., 20,1611-24 Buried treasure J. Hajak, 1961, Some extensions of the Wald- Wolfowitz-Noether theorem. Ann. Math. Statist., 32, 506-523. Jaroslav Hajek
iid Data: CLT: Bootstrap mean: Bootstrap CLT: multinomials
Generalized bootstrap: exchangeable weights Mason, Newton asked: What is CLT for this case?
And the sum For a random permutation Consider two triangular arrays of numbers
Notes about: - Linear rank statistic; studied in nonparametrics. - Hajak 1961 gives weak conditions for AN
Now condition on both data and weights Back to the general bootstrap problem: Key fact: random permutation This is precisely a linear rank statistic, and Hajek (1961) gives general conditions for its asymptotic normality.
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)
V3: Cancer genetics and stochastic geometry Context Cellular events during tumor initiation, intestinal cancer Buried treasure P. Armitage, 1949, An overlap problem arising in particle counting. Biometrika,45, 501-519. Peter Armitage
Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.
Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.
Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.
Monoclonal theory of tumor origin genetic defect apears in a cell
Monoclonal theory of tumor origin aberrant cell divides and persists
Aggregation chimeras provide data on clonality.
B6 Apc Min/+ Mom1 R/R <--> B6 Apc Min/+ Mom1 R/R Rosa26/+ Heterotypic tumor!
clonal cooperation - recruitment; selection many heterotypic tumors … but why?
clonal cooperation - recruitment; selection random collision many heterotypic tumors … but why?
# initiated clones collision distance # isolated clones # doublets # triplets # tumors (one mouse) Key parameters: Induced R.V.’s
# initiated clones collision distance # isolated clones # doublets # triplets # tumors (one mouse) Key parameters: Induced R.V.’s Intractable distribution!!
where But thanks to Armitage, 1949,
Closing the inference loop • Lineage marking • Unknown N’s • Extra Poisson variation
You are dealing with a statistical problem in a special context. You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context. One form of the past effect - -
1915-2000 1926-1974 1924-present John Tukey Jaraslav Hajek Peter Armitage
1915-2000 1926-1974 1924-present 8 43 9 John Tukey Jaraslav Hajek Peter Armitage # citations of key paper