140 likes | 229 Views
Lec 6, Ch.5, pp90-105: Statistics (Objectives). Understand basic principles of statistics through reading these pages, especially… Know well about the normal distribution Know the special characteristics of the Poisson distribution Understand the meaning of correlation and dependence
E N D
Lec 6, Ch.5, pp90-105: Statistics (Objectives) • Understand basic principles of statistics through reading these pages, especially… • Know well about the normal distribution • Know the special characteristics of the Poisson distribution • Understand the meaning of correlation and dependence • Understand what confidence intervals mean • Learn how to estimate sample sizes for data collections • Understand the concept of hypothesis testing
What we cover in class today… Anything not covered in class, you learn them from reading pp.95-105. • The normal distribution – how to read the standard normal distribution table • Central limit theory (CLT) • The Poisson distribution – why it is relevant to traffic engineering • Correlation and dependence • Confidence bounds and their implications • Estimating sample sizes • The concept of hypothesis testing
The normal distribution Mean = 55 mph What’s the probability the next value will be less than 65 mph? z = (x - µ)/ = (65 – 55)/7 = 1.43 From the sample normal distribution to the standard normal distribution.
Use of the standard normal distribution table, Tab 5-1 Z = 1.43 Most popular one is 95% within µ ± 1.96
Central limit theorem (CLT) Definition: The population may have any unknown distribution with a mean µ and a finite variance of 2. Take samples of sizen from the population. As the size of n increases, the distribution of sample means will approach a normal distribution with mean µ and a variance of 2/n. F(x) approaches x µ µ X distribution X ~ any (µ, 2) distribution
The Poisson distribution (“counting distribution” or “Random arrival”) With mean µ = m and variance 2 = m. If the above characteristic is not met, the Poisson does not apply. • The binomial distribution tends to approach the Poisson distribution with parameter m = np. (See Table 4-3) • When time headways are exponentially distributed with mean = 1/, the number of arrivals in an interval T is Poisson distributed with mean = m = T.
Correlation and dependence y = f(x) Linear regression: y = a + bx Non-linear regression: y = axb(example) Dependent variable y Correlation coefficient r (1, perfect fit) Coefficient of determination r2 (Tells you how much of variability can be “explained” by the independent variables.) Independent variable x
X X Confidence bounds and interval Point estimates: A point estimate is a single-values estimate of a population parameter made from a sample. Interval estimates: An interval estimate is a probability statement that a population parameter is between two computed values (bounds). µ True population mean - - Point estimate of X from a sample Two-sided interval estimate X – tas/sqrt(n) X + tas/sqrt(n)
Confidence interval (cont) When n gets larger (n>=30), t can become z. The probability of any random variable being within 1.96 standard deviations of the mean is 0.95, written as: P[(µ - 1.96) y (µ + 1.96)] = 0.95 Obviously we do not know µ and . Hence we restate this in terms of the distribution of sample means: P[( x - 1.96E) y ( x + 1.96E)] = 0.95 Where, E = s/SQRT(n) (Review 1, 2, 3, and 4 in page 100.)
Estimating sample sizes For cases in which the distribution of means can be considered normal, the confidence range for 95% confidence is: If this value is called the tolerance (or “precision”), and given the symbol e, then the following equation can be solved for n, the desired sample size: and By replacing 1.96 with z and 3.84 with z2, we can use this for any level of confidence. (Review 1 and 2 on page 101.)
The concept of hypothesis testing Two distinct choices: Null hypothesis, H0 Alternative hypothesis: H1 E.g. Inspect 100,000 vehicles, of which 10,000 vehicles are “unsafe.” This is the fact given to us. H0: The vehicle being tested is “safe.” H1: The vehicle being tested is “unsafe.” In this inspection, 15% of the unsafe vehicles are determined to be safe Type II error (bad error) and 5% of the safe vehicles are determined to be unsafe Type I error (economically bad but safety-wise it is better than Type II error.)
Types of errors We want to minimize especially Type II error. Steps of the Hypothesis Testing Decision Reality Reject H0 Accept H0 • State the hypothesis • Select the significance level • Compute sample statistics and estimate parameters • Compute the test statistic • Determine the acceptance and critical region of the test statistics • Reject or do not reject H0 H0 is true Type I error Correct Correct Type II error H1 is true Fail to reject a false null hypothesis Reject a correct null hypothesis P(type I error) = (level of significance) P(type II error ) =
Dependence between , , and sample size n There is a distinct relationship between the two probability values and and the sample size n for any hypothesis. The value of any one is found by using the test statistic and set values of the other two. • Given and n, determine . Usually the and n values are the most crucial, so they are established and the value is not controlled. • Given and , determine n. Set up the test statistic for and with H0 value and an H1 value of the parameter and two different n values. The t (or z) statistics is: t or z (Use an example from a stat book)
One-sided and two-sided tests • The significance of the hypothesis test is indicated by , the type I error probability. = 0.05 is most common: there is a 5% level of significance, which means that on the average a type I error (reject a true H0) will occur 5 in 100 times that H0 and H1 are tested. In addition, there is a 95% confidence level that the result is correct. 0.025 each • If H1 involves a not-equal relation, no direction is given, so the significance area is equally divided between the two tails of the testing distribution. Two-sided • If it is known that the parameter can go in only one direction, a one-sided test is performed, so the significance area is in one tail of the distribution. 0.05 One-sided upper