1 / 33

Introduction to Statistical Inference

Recommended Text. Fred L. RAMSEY and Daniel W. SCHAFER. The Statistical Sleuth: A Course in Methods of Data AnalysisBelmont, CA: Duxbury, 2002, xxvi 742 pp., $97.95 (H CD), ISBN: 0-534-38670-9. . Outline. Underlying philosophies about dataBasics of probability and random variablesEstima

tuvya
Download Presentation

Introduction to Statistical Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Introduction to Statistical Inference J. Verducci MBI Summer Workshop August, 2005

    2. Recommended Text Fred L. RAMSEY and Daniel W. SCHAFER. The Statistical Sleuth: A Course in Methods of Data Analysis Belmont, CA: Duxbury, 2002, xxvi + 742 pp., $97.95 (H + CD), ISBN: 0-534-38670-9.

    3. Outline Underlying philosophies about data Basics of probability and random variables Estimating a population proportion Inferring a difference between two distributions Assumptions about distributional forms Normal Theory Nonparametrics Hypothesis Testing T-test Mann-Whitney-Wilcoxon test Multiple Comparisons Bonferroni False Discovery Rate

    4. Affymetrics Mas 5.0 Expression Set

    5. Expression Data Matrix 30,000 genes x 30 patients

    6. Philosophy Frequentist: Observed data X is an imperfect representation of an underlying idealized fixed truth q. Law of Large Numbers: When experiments are repeated faithfully, the average of observations comes closer to their idealization. Bayesian: Observed data X is fixed, and the unknown generating parameter q is random Certainty about q depends on both empirical information X and prior knowledge about q.

    7. Examples: q as a Population Percentage or Average Parameter q Percent of population with a particular allele Percent of free throws made by Shaq over his entire career Mean expression level of BRCA1 gene in breast cancer cells Statistic x Percent observed in a sample of 100 people Set of Shaqs yearly free-throw percentages up to June, 2005 Sample averages from patients in Stages 1-4; patients with high and low HER2 expression

    8. Key Terms Population (Sample Space W) set of all possible outcomes of an experiment Sample subset of the population (Event) that is observed (Generative / Probability) Model description of how samples are obtained from the population Parameter: a feature of the population used to describe the model Statistic: a summary of the sample that conveys information about the parameter of interest.

    9. Axioms of Probability Needed to specify sampling and modeling Definition: A probability measure P is a function from the set of all possible events into [0,1] such that P(f) = 0 P(W) = 1 P( U Ai ) = S P(Ai) for countable collections of disjoint events {Ai}

    10. Random Variables A random variable X is a function from the sample space W into the real numbers R X:W ? R X(w) = x The value x is called a realization of the random variable X. It can also be thought of as a statistic, since it is a function/summary of the sample {w}.

    11. Example Experiment: role two dice (one red,one green) W = { (i,j) | i = 1,,6; j = 1,,6} Probability Model (based on symmetry) P({(i,j)}) = 1/36 for each ordered pair (i,j) Random variable X((i,j)) = i + j The probability model induces a probability function fX on the possible values x of X. fX(x) = (6 - |x-7|) / 36 , x = 2,,12

    12. Probability Function for Sum of Two Dice

    13. Independence Two random variables Y and Z are independent if, for all possible y,z P(Y=y and Z=z) = P(X=x) * P(Y=y) Dice Example: Let Y( (i,j) ) = i Z( (i,j) ) = j Then, for y,z in {1,2,3,4,5,6}, P(Y=y and Z=z) = 1/36 = 1/6 * 1/6 = P(X=x) * P(Y=y)

    14. iid Sample iid: independent, identically distributed Model: X1, X2, , Xn are mutually independent, that is, P(X1 = x1, , Xn = xn) = P(X1 = x1) * * P(Xn = xn) X1, X2, , Xn have the same probability function f(. | q) Xi ~ f(x | q), i = 1,,n

    15. Estimating Population Proportion q Code Xi = 1 if the ith observation has the characteristic of interest = 0 otherwise; i = 1,,n. Bernoulli Distribution: fX(x | q) = q for x = 1 (1-q) for x = 0 0 otherwise Xi are iid Bernoulli(q), i = 1,,n.

    16. Maximum Likelihood Estimate Y = S Xi has the Binomial(n, q) distribution:

More Related