520 likes | 549 Views
STAT 552 PROBABILITY AND STATISTICS II. INTRODUCTION Short review of S551. WHAT IS STATISTICS?.
E N D
STAT 552PROBABILITY AND STATISTICS II INTRODUCTION Short review of S551
WHAT IS STATISTICS? • Statistics is a science of collecting data, organizing and describing it and drawing conclusions from it. That is, statistics is a way to get information from data. It is the science of uncertainty.
BASIC DEFINITIONS • POPULATION: The collection of all items of interest in a particular study. • SAMPLE: A set of data drawn from the population; a subset of the population available for observation • PARAMETER: A descriptive measure of the population, e.g., mean • STATISTIC: A descriptive measure of a sample • VARIABLE: A characteristic of interest about each element of a population or sample.
STATISTIC • Statistic (or estimator) is any function of a r.v. of r.s. which do not contain any unknown quantity. E.g. • are statistics. • are NOT. • Any observed or particular value of an estimator is an estimate.
Sample Space • The set of all possible outcomes of an experiment is called a sample space and denoted byS. • Determining the outcomes. • Build an exhaustive list of all possible outcomes. • Make sure the listed outcomes are mutually exclusive.
RANDOM VARIABLES • Variables whose observed value is determined by chance • A r.v. is a function defined on the sample space S that associates a real number with each outcome in S. • Rvs are denoted by uppercase letters, and their observed values by lowercase letters.
DESCRIPTIVE STATISTICS • Descriptive statistics involves the arrangement, summary, and presentation of data, to enable meaningful interpretation, and to support decision making. • Descriptive statistics methods make use of • graphical techniques • numerical descriptive measures.
PROBABILITY POPULATION SAMPLE STATISTICAL INFERENCE
PROBABILITY: A numerical value expressing the degree of uncertainty regarding the occurrence of an event. A measure of uncertainty. • STATISTICAL INFERENCE: The science of drawing inferences about the population based only on a part of the population, sample.
Probability P : S [0,1] Probability domain range function
THE CALCULUS OF PROBABILITIES • If P is a probability function and A is any set, then a. P()=0 b. P(A) 1 c. P(AC)=1 P(A)
ODDS • The odds of an event A is defined by • It tells us how much more likely to see the occurrence of event A.
ODDS RATIO • OR is the ratio of two odds. • Useful for comparing the odds under two different conditions or for two different groups, e.g. odds for males versus females.
CONDITIONAL PROBABILITY • (Marginal) Probability: P(A): How likely is it that an event A will occur when an experiment is performed? • Conditional Probability: P(A|B): How will the probability of event A be affected by the knowledge of the occurrence or nonoccurrence of event B? • If two events are independent, then P(A|B)=P(A)
BAYES THEOREM • Suppose you have P(B|A), but need P(A|B).
Independence • A and B are independent iff • P(A|B)=P(A) or P(B|A)=P(B) • P(AB)=P(A)P(B) • A1, A2, …, An are mutually independent iff for every subset j of {1,2,…,n} E.g. for n=3, A1, A2, A3 are mutually independent iff P(A1A2A3)=P(A1)P(A2)P(A3) and P(A1A2)=P(A1)P(A2) and P(A1A3)=P(A1)P(A3) and P(A2A3)=P(A2)P(A3)
DISCRETE RANDOM VARIABLES • If the set of all possible values of a r.v. X is a countable set, then X is called discrete r.v. • The function f(x)=P(X=x) for x=x1,x2, … that assigns the probability to each value x is called probability density function (p.d.f.) or probability mass function (p.m.f.)
Example • Discrete Uniform distribution: • Example: throw a fair die. P(X=1)=…=P(X=6)=1/6
CONTINUOUS RANDOM VARIABLES • When sample space is uncountable (continuous) • Example: Continuous Uniform(a,b)
CUMULATIVE DENSITY FUNCTION (C.D.F.) • CDF of a r.v. X is defined as F(x)=P(X≤x).
JOINT DISCRETE DISTRIBUTIONS • A function f(x1, x2,…, xk) is the joint pmf for some vector valued rv X=(X1, X2,…,Xk) iff the following properties are satisfied: f(x1, x2,…, xk) 0 for all (x1, x2,…, xk) and
MARGINAL DISCRETE DISTRIBUTIONS • If the pair (X1,X2) of discrete random variables has the joint pmf f(x1,x2), then the marginal pmfs of X1 and X2 are
CONDITIONAL DISTRIBUTIONS • If X1 and X2 are discrete or continuous random variables with joint pdf f(x1,x2), then the conditional pdf of X2 given X1=x1 is defined by • For independent rvs,
EXPECTED VALUES Let X be a rv with pdf fX(x) and g(X) be a function of X. Then, the expected value (or the mean or the mathematical expectation) of g(X) providing the sum or the integral exists, i.e., <E[g(X)]<.
EXPECTED VALUES • E[g(X)] is finite if E[| g(X) |]is finite.
Laws of Expected Value E(c) = c E(X + c) = E(X) + c E(cX) = cE(X) Laws of Variance V(c) = 0 V(X + c) = V(X) V(cX) = c2V(X) Laws of Expected Value and Variance Let X be a rv and c be a constant.
EXPECTED VALUE If X and Y are independent, The covariance of X and Y is defined as
EXPECTED VALUE If X and Y are independent, The reverse is usually not correct! It is only correct under normal distribution. If (X,Y)~Normal, then X and Y are independent iff Cov(X,Y)=0
EXPECTED VALUE If X1 and X2 are independent,
CONDITIONAL EXPECTATION AND VARIANCE (EVVE rule) Proofs available in Casella & Berger (1990), pgs. 154 & 158
SOME MATHEMATICAL EXPECTATIONS • Population Mean: = E(X) • Population Variance: (measure of the deviation from the population mean) • Population Standard Deviation: • Moments:
This measure reflects the dispersion of all the observations • The variance of a population of size N x1, x2,…,xN whose mean is m is defined as • The variance of a sample of n observationsx1, x2, …,xn whose mean is is defined as The Variance
MOMENT GENERATING FUNCTION The m.g.f. of random variable X is defined as for t Є (-h,h) for some h>0.
Properties of m.g.f. • M(0)=E[1]=1 • If a r.v. X has m.g.f. M(t), then Y=aX+b has a m.g.f. • M.g.f does not always exists (e.g. Cauchy distribution)
CHARACTERISTIC FUNCTION The c.h.f. of random variable X is defined as for all real numbers t. C.h.f. always exists.
Uniqueness Theorem: • If two r.v.s have mg.f.s that exist and are equal, then they have the same distribution. • If two r.v.s have the same distribution, then they have the same m.g.f. (if they exist) Similar statements are true for c.h.f.
SOME DISCRETE PROBABILITY DISTRIBUTIONS • Please review: Degenerate, Uniform, Bernoulli, Binomial, Poisson, Negative Binomial, Geometric, Hypergeometric, Extended Hypergeometric, Multinomial
SOME CONTINUOUS PROBABILITY DISTRIBUTIONS • Please review: Uniform, Normal (Gaussian), Exponential, Gamma, Chi-Square, Beta, Weibull, Cauchy, Log-Normal, t, F Distributions
TRANSFORMATION OF RANDOM VARIABLES • If X is an rv with pdf f(x), then Y=g(X) is also an rv. What is the pdf of Y? • If X is a discrete rv, replace Y=g(X) whereever you see X in the pdf of f(x) by using the relation . • If X is a continuous rv, then do the same thing, but now multiply with Jacobian. • If it is not 1-to-1 transformation, divide the region into sub-regions for which we have 1-to-1 transformation.
CDF method • Example: Let Consider . What is the p.d.f. of Y? • Solution:
M.G.F. Method • If X1,X2,…,Xn are independent random variables with MGFs Mxi (t), then the MGF of is
THE PROBABILITY INTEGRAL TRANSFORMATION • Let X have continuous cdfFX(x) and define the rvY as Y=FX(x). Then, Y ~ Uniform(0,1), that is, P(Y y) = y, 0<y<1. • This is very commonly used, especially in random number generation procedures.
SAMPLING DISTRIBUTION • A statistic is also a random variable. Its distribution depends on the distribution of the random sample and the form of the function Y=T(X1, X2,…,Xn). The probability distribution of a statistic Y is called the sampling distribution of Y.
SAMPLING FROM THE NORMAL DISTRIBUTION Properties of the Sample Mean and Sample Variance • Let X1, X2,…,Xn be a r.s. of size n from a N(,2) distribution. Then,
SAMPLING FROM THE NORMAL DISTRIBUTION If population variance is unknown, we use sample variance:
SAMPLING FROM THE NORMAL DISTRIBUTION • The F distribution allows us to compare the variances by giving the distribution of • If X~Fp,q, then 1/X~Fq,p. • If X~tq, then X2~F1,q.
X Random Variable (Population) Distribution Sample Mean Distribution CENTRAL LIMIT THEOREM If a random sample is drawn from any population, the sampling distribution of the sample mean is approximately normal for a sufficiently large sample size. The larger the sample size, the more closely the sampling distribution of will resemble a normal distribution. Random Sample (X1, X2, X3, …,Xn)