340 likes | 559 Views
Recommended Text. Fred L. RAMSEY and Daniel W. SCHAFER. The Statistical Sleuth: A Course in Methods of Data AnalysisBelmont, CA: Duxbury, 2002, xxvi 742 pp., $97.95 (H CD), ISBN: 0-534-38670-9. . Outline. Underlying philosophies about dataBasics of probability and random variablesEstima
E N D
1. Introduction to Statistical Inference J. Verducci
MBI Summer Workshop
August, 2005
2. Recommended Text Fred L. RAMSEY and Daniel W. SCHAFER.
The Statistical Sleuth:
A Course in Methods of Data Analysis
Belmont, CA: Duxbury, 2002, xxvi + 742 pp.,
$97.95 (H + CD),
ISBN: 0-534-38670-9.
3. Outline Underlying philosophies about data
Basics of probability and random variables
Estimating a population proportion
Inferring a difference between two distributions
Assumptions about distributional forms
Normal Theory
Nonparametrics
Hypothesis Testing
T-test
Mann-Whitney-Wilcoxon test
Multiple Comparisons
Bonferroni
False Discovery Rate
4. Affymetrics Mas 5.0 Expression Set
5. Expression Data Matrix30,000 genes x 30 patients
6. Philosophy Frequentist: Observed data X is an imperfect representation of an underlying idealized fixed truth q.
Law of Large Numbers: When experiments are repeated faithfully, the average of observations comes closer to their idealization.
Bayesian: Observed data X is fixed, and the unknown generating parameter q is random
Certainty about q depends on both empirical information X and prior knowledge about q.
7. Examples: q as a Population Percentage or Average Parameter q
Percent of population with a particular allele
Percent of free throws made by Shaq over his entire career
Mean expression level of BRCA1 gene in breast cancer cells
Statistic x
Percent observed in a sample of 100 people
Set of Shaqs yearly free-throw percentages up to June, 2005
Sample averages from patients in Stages 1-4; patients with high and low HER2 expression
8. Key Terms Population (Sample Space W) set of all possible outcomes of an experiment
Sample subset of the population (Event) that is observed
(Generative / Probability) Model description of how samples are obtained from the population
Parameter: a feature of the population used to describe the model
Statistic: a summary of the sample that conveys information about the parameter of interest.
9. Axioms of Probability Needed to specify sampling and modeling
Definition: A probability measure P is a function from the set of all possible events into [0,1] such that
P(f) = 0
P(W) = 1
P( U Ai ) = S P(Ai) for countable collections of disjoint events {Ai}
10. Random Variables A random variable X is a function from the sample space W into the real numbers R
X:W ? R
X(w) = x
The value x is called a realization of the random variable X. It can also be thought of as a statistic, since it is a function/summary of the sample {w}.
11. Example Experiment:
role two dice (one red,one green)
W = { (i,j) | i = 1,,6; j = 1,,6}
Probability Model (based on symmetry)
P({(i,j)}) = 1/36 for each ordered pair (i,j)
Random variable X((i,j)) = i + j
The probability model induces a probability function fX on the possible values x of X.
fX(x) = (6 - |x-7|) / 36 , x = 2,,12
12. Probability Function for Sum of Two Dice
13. Independence Two random variables Y and Z are independent if, for all possible y,z
P(Y=y and Z=z) = P(X=x) * P(Y=y)
Dice Example:
Let
Y( (i,j) ) = i
Z( (i,j) ) = j
Then, for y,z in {1,2,3,4,5,6},
P(Y=y and Z=z) = 1/36 = 1/6 * 1/6 = P(X=x) * P(Y=y)
14. iid Sample iid: independent, identically distributed
Model:
X1, X2, , Xn are mutually independent, that is,
P(X1 = x1, , Xn = xn) = P(X1 = x1) * * P(Xn = xn)
X1, X2, , Xn have the same probability function f(. | q)
Xi ~ f(x | q), i = 1,,n
15. Estimating Population Proportion q Code Xi
= 1 if the ith observation has the characteristic of interest
= 0 otherwise; i = 1,,n.
Bernoulli Distribution:
fX(x | q) =
q for x = 1
(1-q) for x = 0
0 otherwise
Xi are iid Bernoulli(q), i = 1,,n.
16. Maximum Likelihood Estimate Y = S Xi has the Binomial(n, q) distribution: