160 likes | 310 Views
Inference. Mary M. Whiteside, Ph.D. Nonparametric Statistics. Two Sides of Inference. Parametric Interval estimation, xbar Hypothesis testing, m 0 Nonparametric Interval estimates, EDF Hypothesis testing, P(X<Y) > P(X>Y). Meaning of Nonparametric. Not about parameters
E N D
Inference Mary M. Whiteside, Ph.D. Nonparametric Statistics
Two Sides of Inference • Parametric • Interval estimation, xbar • Hypothesis testing, m0 • Nonparametric • Interval estimates, EDF • Hypothesis testing, P(X<Y) > P(X>Y)
Meaning of Nonparametric • Not about parameters • Methods for non-normal distributions • Methods for ordinal data • Data Scales • Nominal, categorical, qualitative • Ordinal • Interval • Ratio - natural zero
Random Sample - Type 1 • Random sample from a finite population • Simple • Stratified • Cluster • Inferences are about the finite population • Audit comprised of a sample from a population of invoices • Public opinion polls • QC samples of delivered goods
Random Sample - Type 2 • Observations of (iid) random variables • Inferences are about the probability distributions of the random variables • Weekly average miles per gallon for your new Lexus • Chi square tests of independence in medical treatment offered men and women • Effect of female literacy on infant mortality worldwide
Transition from data sets to distributions • All random variables, by definition, have probability functions (pmf or pdf) and cumulative probability distributions • Random variables defined on a random sample (Type 1 or 2) are called statistics with probability distributions that are called sampling distributions
Sampling Distributions • Statistics support both sides of inference • Estimators - random variables used to create interval estimates • Test statistics - random variables used to test hypotheses
Consider Xbar - a parametric statistic • Type I sample - subset of invoices where X = sales tax paid on an invoice randomly selected from a finite population • Xbar is the average sales tax of n randomly selected invoices • Xbar is an estimator of m, the average sales tax paid for the population of invoices (with standard deviation s) • Xbar is a test statistic for testing hypotheses H0: m = m0 • Xbar is a random variable with sampling distribution asymptotically normal as n increases with mean m and standard deviation sn
Consider Xbar - a parametric statistic • Type 2 sample - the complete set of miles per gallon observations made by you since buying your Lexus where X = mpg for your Lexus in a given week • Xbar is the average mpg for n observations of X • Xbar is an estimator of the expected value (mX) of the RV X • Xbar is a test statistic for testing hypotheses H0: m = m0 • Xbar is a random variable with sampling distribution asymptotically normal as n increases with mean mX and standard deviationsX/n
X in the Type 1 sample • If X from a Type 1 sample is regarded as a random variable, then it has the discrete uniform distribution • Prob [X = x] = 1/N for all x in the population (where the N values of x are assumed to be unique)
Order statistics of rank k - a nonparametric statistic • the kth order statistic is the kth smallest observation • the first order statistic is the smallest observation in a sample • the nth order statistic is the largest • Large body of literature on sampling distributions of order statistics
Estimation • Definitions • EDF • pth sample quantile • sample mean, variance, and standard deviation • unbiased estimators (S2 and s2)
Intervals for parameter estimation • (point estimate - r*standard error of the estimator, point estimate +q*standard error of the point estimate) where r is the a/2 quantile and q is the (1-a/2) quantile from the sampling distribution of the estimator • r equals -q in symmetric distributions with mean 0 (z = +/- 1.96 or t = +/-2.02581) • r does not equal -q in skewed distributions such as Chi squared and F
Sampling distribution of the estimator • Parametric procedures - Assumed normal or normal based from the Central Limit Theorem and sample size • Xbar is approximately normal if n is large • Xbar is t if X is normal and s is unknown • Xbar’s distribution is unknown if X’s distribution is unknown and n is small
Sampling distribution of the estimator • Nonparametric distribution-free procedures I.e. the sampling distribution of the statistic (estimator or test statistic) is “free” from the distribution of X • rank order statistics • bootstrapped distributions - a/2 and 1-a/2 quantiles
Parametric vs nonparametric sampling distributions • Exact distributions with approximate models • Exact distributions with exact models (but usually small samples) or • Asymptotic distributions with exact models