Statistical Inference

Statistical Inference Chapter 12/13

Statistical Inference • Given a sample of observations from a population, the statistical inference consists in estimating characteristics of the population. • A characteristic may be guessed to: • Be a number (point estimation) • Lay within an interval COMP 5340/6340 Statistical Inference

Point Estimation

Random Sampling • A sample is a subset of observations out of a population (in general, the all population is not observable) • For correct inference, sampling must be random COMP 5340/6340 Statistical Inference

Point Estimate • A statistic is any function of random variables constituting one or more samples, provided that the function does not depend on any unknown parameter values • A point estimate of a parameter  is a single number that can be regarded as the most plausible value of . A point estimate is obtained by selecting a suitable statistic and computing its value from a given sample. The selected statistic is called the point estimate of . COMP 5340/6340 Statistical Inference

Example • We want to evaluate the packet loss rate on a given channel. 25 packets are sent. Let X = number of corrupted (lost) packets. The parameter to be estimated is p = the proportion of lost packets • Propose an estimator COMP 5340/6340 Statistical Inference

Example (2) • We assume that the waiting time for a bus is uniformly distributed. However, we do not know the upper limit of the probability distribution. We want to estimate this parameter of the uniform probability distribution. • Propose an estimator COMP 5340/6340 Statistical Inference

Sampling Distribution • Multiple samples can be drawn • Each sample may yield a different estimate • Therefore, the estimate is a random variable with a probability distribution. • In order to “evaluate” an estimate, we want to have an idea of its: • Central tendency • Variability COMP 5340/6340 Statistical Inference

Unbiased Estimators • A point estimator ˆis said to be an unbiased estimator of  if E(ˆ) =  for every possible value of . If  is not unbiased, the difference E(ˆ) –  is called the bias. • Intuitively, an unbiased estimator is one that can equally underestimate or overestimate a given parameter. COMP 5340/6340 Statistical Inference

Estimators • Mean • Median • [Max(samples)-Min(samples)]/2 • Trimmed mean Xtr(10) COMP 5340/6340 Statistical Inference

Some Unbiased Estimators • If X is a binomial random variable with parameter n and p, the sample proportion p=X/n is an unbiased estimator of p. • Let X1, X2,…Xn be a random sample from a distribution with mean  and variance . Then the estimator is an unbiased estimator of . COMP 5340/6340 Statistical Inference

Some Unbiased Estimators (2) • Let X1, X2,…Xnbe a random sample from a distribution with mean  and variance . Then the estimator is an unbiased estimator of . • If the distribution is continuous and symmetric, then any trimmed mean is also an unbiased estimator. COMP 5340/6340 Statistical Inference

Desirable Properties of Estimators • Unbiased • Minimal variance • The precision of an estimator is measured by the standard error of the estimator, i.e. COMP 5340/6340 Statistical Inference

Methods of Point Estimation

Methods of Point Estimation • Method of moments • Maximum likelihood estimation (recommended for large samples) COMP 5340/6340 Statistical Inference

Method of Moments • Reminder: Definition of moments • Definition: Let X1, X2,…Xn be a random sample from a p.m.f or p.d.f f(x). For k=1,2,3… the kth population moment or kth moment of the distribution f(x) is E(Xk). The kth sample moment is • Let X1, X2,…Xn be a random sample from a p.m.f or p.d.f f(x, 1…. m) where 1…. m are the parameters whose values are unknown. Then, the moments estimators ˆ1…. ˆm are obtained by equating the first m sample moments to the first correspondng population moments and solving for 1…. m. COMP 5340/6340 Statistical Inference

Example 1 • Let x1, x2,…xn represent a random sample of service time of n customers at some facility, where the underlying distributionis assumed exponential with parameter . • How many parameters need to be estimated? • What is the 1st sample moment? • What is the moment estimator of  COMP 5340/6340 Statistical Inference

Example 2 • Let x1, x2,…xn represent a random sample from a gamma distribution with parameters  and . • Reminder: E(X) =  and V(X) = . • How many parameters need to be estimated? • What is the 1st sample moment? • What is the 2nd sample moment? • What are the estimators for  and  COMP 5340/6340 Statistical Inference

Example 3/Exercise • Let x1, x2,…xn represent a random sample from a generalized negative binomial distribution with parameters r and p. • Reminder: E(X) = ? and V(X) = ?. • How many parameters need to be estimated? • What is the 1st sample moment? • What is the 2nd sample moment? • What are the estimators for r and p COMP 5340/6340 Statistical Inference

Maximum Likelihood Estimation • Example: • 10 packets are sent over a lossy channel with packet loss rate p. The 2nd, 4th, and 8th are lost. The joint p.mf. of this sample is: • f(x1, x2,…x10;p) = (1-p)p(1-p)p(1-p)…p(1-p)(1-p) = p3(1-p)7 • For what value of p is the observed sample most likely to have occurred? • In other words, for which value of p is [p3(1-p)7] maximized? • f(x1, x2,…x10;p) is maximized for the value of p such that COMP 5340/6340 Statistical Inference

Maximum Likelihood Estimation Definition: - Let X1, X2,…Xn have a joint p.m.f or p.d.f. f(x1,x2,…,xn, 1…. m) where the parameters 1…. m take unknown values. -f(x1,x2,…,xn, ˆ1…. ˆm) can be considered as a function of the parameters ˆ1…. ˆm and is called the likelihood function. - The maximum likelihood estimates ˆ1…. ˆm are those values for the qi that maximize the likelihood function. COMP 5340/6340 Statistical Inference

M.L.E Example • Let x1, x2,…xn represent a random sample from an exponential distribution with parameter . Because of the independence, the likelihood function is a product of the individual p.d.f.’s. • To maximize products, it is better to work with the ln (natural log) COMP 5340/6340 Statistical Inference

M.L.E Example (2) • To find the value of  that maximizes the likelihood function, we derive • We solve the equation COMP 5340/6340 Statistical Inference

Interval Estimation(Single Sample)

Introduction to Confidence Interval • Simple case. We are interested in the following parameter: the population mean . • Assume (unrealistically) that: • The population distribution is normal • The value of the population standard deviation  is known. COMP 5340/6340 Statistical Inference

Introduction to Confidence Interval (2) • Let x1, x2,…xn represent a random sample from normal distribution with mean  and standard deviation . The objective is to find a confidence interval of 95% for . • What can we say of the random variable ? • Probability distribution? • Mean? • Standard deviation? • What is the standardized variable Z for Y? • Using the normal distribution table: Normal  COMP 5340/6340 Statistical Inference

Introduction to Confidence Interval (3) Manipulating We get Substituting with the sample values COMP 5340/6340 Statistical Inference

1-  0 100(1- )% Confidence Interval • Definition: a 100(1- )% confidence interval for the mean  of a normal population when the value of is known is given by COMP 5340/6340 Statistical Inference

1-  0 100(1- )% Confidence IntervalCommon Values • Any confidence is achievable (need normal distribution table) • Common values used are 90%, 95% , and 99% COMP 5340/6340 Statistical Inference

1-  0 Confidence, Sample Size, and Interval Length • Increasing confidence increases the interval length (sample size fixed) • To increase confidence without increasing interval length, one must increase sample size n COMP 5340/6340 Statistical Inference

Large Sample Confidence Intervals -Population mean - Proportion

Confidence Interval for the Population Mean • If n est sufficiently large has approximately a standard normal distribution. This implies that is a large-sample confidence interval for  with confidence level approximately 100(1- )%. COMP 5340/6340 Statistical Inference

Confidence Interval for the Population Proportion • A large-sample 100(1- )% confidence interval for a population proportion is where , n is the sample size, and x is the number of successes. This interval is valid whenever COMP 5340/6340 Statistical Inference

Exercise • 1) What should be the sample size to achieve 100(1- )% confidence interval over an interval of length L? Assume p known • 2) What should be the sample size to achieve 100(1- )% confidence interval over an interval of length L? Assume p UNknown COMP 5340/6340 Statistical Inference

Statistical Inference