560 likes | 691 Views
Buried treasures Old statistics in new contexts. “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton. You are dealing with a statistical problem in a special context.
E N D
“If I have seen further it is by standing on the shoulders of giants” - Isaac Newton
You are dealing with a statistical problem in a special context. You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context. One form of the past effect - -
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application) Three vignettes
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)
John Tukey V1: Genomics meets sample surveys Context Second-order gene-set enrichment analysis Buried treasure J.W. Tukey, 1950, Some sampling simplified. J. Amer. Statist. Assoc., 45, 501-519.
Context D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor, TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007). Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619. MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007). Second order enrichment analysis of microarray expression data reveals gene sets with heterogeneous activation states. Submitted.
Context D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor, TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007). Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619. MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007). Second order enrichment analysis of microarray expression data reveals gene sets with heterogeneous activation states. Submitted.
genes (a few) HPV - tissue samples HPV + Slice of expression data from Pyeon et al. 2007
Fold changes between HPV+ and HPV- (all genes) density -2 -1 0 1 2 log2 [ HPV+ / HPV- ]
The post-processing problem + expression exogenous results biology
Exogenous biology B = { c: c = {genes with specific property } } e.g. - gene ontology (GO) - Kyoto Encylopedia (KEGG)
In HPV example, cell cycle may be an interesting gene set Excess differential expression in both directions Large sample variance (largest in KEGG, GO)
Expression results: Gene set: B Gene set variance: Standardized statistic:
Connection: C indexes a simple random sample of genes I.e. finite population sampling Centering: ?? Scaling:
We get: following Tukey’s 1950 calculation involving “K” functions: set-level statistics whose expected value equals the same statistic computed on the whole population
b1 b2 1 0 -3 1 -4 0 12 -2 -6 1 where
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)
V2: Bootstrapping and rank statistics Context Mason and Newton, 1992, A rank statistics approach to the Consistency of a general bootstrap. Ann. Statist., 20,1611-24 Buried treasure J. Hajak, 1961, Some extensions of the Wald- Wolfowitz-Noether theorem. Ann. Math. Statist., 32, 506-523. Jaroslav Hajek
iid Data: CLT: Bootstrap mean: Bootstrap CLT: multinomials
Generalized bootstrap: exchangeable weights Mason, Newton asked: What is CLT for this case?
And the sum For a random permutation Consider two triangular arrays of numbers
Notes about: - Linear rank statistic; studied in nonparametrics. - Hajak 1961 gives weak conditions for AN
Now condition on both data and weights Back to the general bootstrap problem: Key fact: random permutation This is precisely a linear rank statistic, and Hajek (1961) gives general conditions for its asymptotic normality.
V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)
V3: Cancer genetics and stochastic geometry Context Cellular events during tumor initiation, intestinal cancer Buried treasure P. Armitage, 1949, An overlap problem arising in particle counting. Biometrika,45, 501-519. Peter Armitage
Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.
Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.
Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.
Monoclonal theory of tumor origin genetic defect apears in a cell
Monoclonal theory of tumor origin aberrant cell divides and persists
Aggregation chimeras provide data on clonality.
B6 Apc Min/+ Mom1 R/R <--> B6 Apc Min/+ Mom1 R/R Rosa26/+ Heterotypic tumor!
clonal cooperation - recruitment; selection many heterotypic tumors … but why?
clonal cooperation - recruitment; selection random collision many heterotypic tumors … but why?
# initiated clones collision distance # isolated clones # doublets # triplets # tumors (one mouse) Key parameters: Induced R.V.’s
# initiated clones collision distance # isolated clones # doublets # triplets # tumors (one mouse) Key parameters: Induced R.V.’s Intractable distribution!!
where But thanks to Armitage, 1949,
Closing the inference loop • Lineage marking • Unknown N’s • Extra Poisson variation
You are dealing with a statistical problem in a special context. You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context. One form of the past effect - -
1915-2000 1926-1974 1924-present John Tukey Jaraslav Hajek Peter Armitage
1915-2000 1926-1974 1924-present 8 43 9 John Tukey Jaraslav Hajek Peter Armitage # citations of key paper