640 likes | 665 Views
Analyzing a complex statistical study involving ANOVA, factor variables, treatments, multi-factor experiments, estimators, and sample size calculations. Explore means, variances, unbiased estimators, and confidence intervals.
E N D
Final ENM 500 Review rd
ANOVA • Factor – independent variable under investigation Price • Factor Level $100 $200 $300 Price has 3 levels • Single Factor Price • Multi-factor Price and Location on Sales • Treatment in one-way ~ level • Treatment in multi-factor ~ cell combination • Each population is N(i, 2) rd
10 B-low B-high 50 A B AB (1) 2 6 b (1) - - +A-low4 8 a + - - a 2 1 ab b - + - 50A-high 4 5 ab + + + (anova '((2 4 6 8)(2 4 1 5)) 2) Aeff = (3+3-7-3)/2 = -2Beff = (-3-3+7+3)/2 = 2 ABeff = (3-3 -7+3)/2 = -2 Source SS dfMS F p-value Coded VariablesRows A 8 1 8 2.28 0.1554 50 - ½ (50 + 10)/20 = 1 Columns B 8 1 8 2.28 0.1554 10 – ½ (50 + 10)/20 = -1RC AB 8 1 8 2.28 0.1554 20 = ½ (50 – 10)Error 14 4 3.5 SSA = 16Total 38 7 (Y-hat '((-1 -1 1 1 -1 -1 1 1)(-1 -1 -1 -1 1 1 1 1)(1 1 -1 -1 -1 -1 1 1)) '(2 4 2 4 6 8 1 5)) Y-hat = 4 - 1X1 + 1X2 - 1X1X2 4 is grand mean R2 = 24/38 = 63.15% rd
2 64 8 R1 = 5 2 1 4 5 R2 = 3 C1 = 3 C2 = 5 X-bar-bar = 4 (anova '((2 4 2 4)(6 8 1 5)) 2) Source SS df MS F p-valueRows 8 1 8 2.28 0.1554Columns 8 1 8 2.28 0.1554RC 8 1 8 2.28 0.1554Error 14 4 3.5Total 38 7 Rows = 4[5-4)2 + (3 – 4)2 = 8 RC = 2[3 - 5 – 3 + 4)2 + (7 – 5 - 5 + 4)2 +( 3 – 3 - 3 + 4)2 + (3 – 3 – 5 + 4)2 2[1 + 1 + 1 + 1] = 8 Error = (1 + 1 + 1 + 1 + 1 + 1 + 4 + 4) = 14 Cell = 2[(3 - 4)2 + (7 -4)2 + (3 – 4)2 + (5 – 4)2 = 24 rd
2 x 2 ANOVA • 1B2__ • 4 5 3 4 • A 1 6 5 7 6 • 2 5 7 10 6 • 5 3 4 4 • 1. Assume one way and compute Within variance of B3 and B4 • 2. Assume 2 x 2 and compute Between SS for A1 • 3. Assume 2 x 2 and compute Interaction SS for A1B1. • 4. Assume 2 x 2 and compute the Row effects using a contrast. rd
Sample Size • How large a sample is needed for a 95% confidence interval for with = 0.3 for an error less than 0.05? • Define the most efficient estimator. • The unbiased estimator with the minimum variance. rd
Exponential • Find P(X-bar < 0.51) given 49 random samples from f(x) = 2e-2x, x > 0. • Write the MM estimator for k. • Write the MM estimate for k given x-bar = 0.51 • Find the MME for a Poisson parameter k • Solution: • (normal 1/2 1/196 0.51) 0.55567 • k-hatMME =1/x-bar • d) k-hat = x-bar rd
Binomial • Show that p-hat = x/n is an unbiased estimator for p and write the variance and standard error of p-hat. • E(p-hat) = E(x/n) = np/n = p • V(p-hat) = V(x/n) = npq/n2 = pq/n • Sp-hat = (pq/n)1/2 rd
P-value • RV X ~ N(15, 36). A sample 36 revealed an x-bar of 20. • Find the p-value for testing H0: = 15 vs. H1: 15. • (* 2 (U-normal 15 36/36 20)) 5.960464e-7 • State sanity check. rd
Random Samples & IIDs • X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 • (swr 10 (sim-binomial 100 1/2 50)) (45 44 54 47 48 45 47 56 49 48) • (swr 10 (sim-binomial 100 1/2 50)) (53 43 48 51 39 49 41 49 51 46) • ……………………………………………………………………………………… • (swr 10 (sim-binomial 100 1/2 50)) (48 48 53 55 49 54 53 41 46 41) • (swr 10 (sim-binomial 100 1/2 50)) (51 50 58 44 54 49 50 51 50 57)(swr 10 (sim-binomial 100 1/2 50)) (50 61 36 55 50 43 55 50 50 52)(swr 10 (sim-binomial 100 1/2 50)) (49 50 52 46 55 51 46 51 50 52) rd
Estimators • Point mean median mode • Unbiased & Variance => most efficient • Interval seek short and high degree of confidence • 95% confident that is within [5 6] is better than • 99% confident that is within [2 9] • 1.96 • x-bar ks rd
Bias Estimator E( ) = => is unbiased Show that 2 is a biased estimator for µ2. E(X2) = V(X) + [E(X)]2 E( 2) = 2/n + µ2 Bias = 2/n and decreases with increasing n. rd
Bias Estimator E(theta-hat) = => theta-hat is unbiased Show that x-bar2 is a biased estimator for µ2. E(X2) = V(X) + [E(X)]2 E(x-bar2) = 2/n + µ2 Bias = 2/n and decreases with increasing n. rd
Mean Square Error • Compare relative efficiency using x-bar and a single sample from N(?, 2) • (2/n)/2 = 1/n => large samples are desirable for estimating. rd
Asymptototic Normal 1. Let RV X have density below. Find mean and variance of X, and for sample size of 36. Find P( < 5.5). X 4 5 6 7 P(X = x) 0.2 0.4 0.3 0.1 E(X) = 5.3; E(X2) = 28.9; V(X) = E(X2) – E2(X) = 0.81 ~ N(5.3, 0.81/36 = 0.0225) => P( < 5.5) = 0.91. (L-normal 5.3 81/3600 5.5) 0.9087886 rd
Sample • In a random sample of size 100 from the continuous uniform on [2, 52], find the P( < 26). • ~ N(27, 2.083 ) where V( ) = 502/(12*100) • P( < 26) =(L-normal 27 2500/1200 26) = 0.2442111 • 3. Find the probability that exceeds 5/8 from an exponential sample size of 49 with k = 2. • ~ N[1/2, 1/(4*49)] • (U-normal 1/2 (/ 1 (* 4 49)) 5/8) 0.0400592 rd
Population and Sample Moments • Population Moments Sample Moments • E(X) • E(X2) • E(X3) • ... … rd
MME for N(, 2) • Express the parameter in terms of population moments. • = E(X) Done. Substitute the sample moment for the population moment. • 2 = E(X2) – E2(X) • mu-hat = x-bar rd
MME • 4. Find the MME for given f(x) = e- x • E(X) = • E(X) = E(1/theta) = 1/ x-bar • Express the parameter of interest in terms of the population moments. Then substitute the sample moments for the population moments. rd
Maximum Likelihood Estimators Random Sample is X1, X2, …Xn Likelihood function is the joint density product iid => Independent and identically distributed Find estimator which maximizes the joint density Suppose samples are from Bernoulli densities. Then L = px1(1 – p)1-x1 * px2(1 – p)1-x2…* … = Ln L = Ln p + (n - )Ln (1 – p) (Ln L)’ = / p – (n - ) / (1 – p) = 0 => rd
6. Consider RV X with density given by f(x; ) = ( + 1)x on [0, 1]; > -1. A random sample of size 10 is: 0.89 0.33 0.92 0.95 0.86 0.68 0.56 0.91 0.53 0.90 from which x-bar = 0.753. Find both the MME and the MLE for and compute the estimates for from the data. MME: 2.05 MLE: L(xi; ) = ( + 1)n Ln L(xi; ) = n Ln ( + 1) + Ln(xi) (Ln L)’ = n / ( + 1) + Ln(xi) = 0 when (see below) = 2.27 Data taken from the density f(x) = 3x2 with = 2. rd
MLE vs. MME Compare the MME with MLE for estimating given a random sample from the density f(x) = 1 / on [0, ]. MME: E(X) = /2 => MLE: L(xi, ) = 1 / n To maximize L, we seek to make as small as possible but must be larger than all the Xi => = max {Xi}. rd
Point Estimates • E(X1) = , V(X1) = 4, E(X2) = , V(X2) = 6 • a) Compute V(X1/2 + X2/2) = 4/4 + 6/4 = 5/2 = 2.5 • b) Find p that minimizes V(pX1 + (1 – p)X2) • V(pX1 + (1 – p)X2) = 4p2 + 6(1 – p)2 • V' = 8p -12 (1 – p) = 0 when p = 3/5 rel min • V = 4(9/25) + 6(4/25) = 60/25 = 2.4 • Note that if E[pX1 + (1 – p)X2] = => • p + (1 – p) = 1 to make the estimator unbiased • c) Find relative efficiency of -hat1 to the point estimate with the smallest variance. 2.4/2.5 = 0.96 rd
Confidence Intervals z/2 < Z < z1 – /2 z/2 < < z1 – /2 ( ) rd
Interval Length = 6.0301 µ = 25 = 20 n = 20 50% Conf Intervals _______ (28.9262, 34.9563) _______ (22.2692, 28.2993) _______ (24.4427, 30.4729) _______ (17.7173, 23.7474) _______ (20.2293, 26.2594) _______ (22.8699, 28.9000) _______ (28.3861, 34.4163) _______ (21.7262, 27.7563) _______ (17.6309, 23.6610) _______ (15.1510, 21.1811) (sim-plot-ci 25 20 20 10 50) rd
Example Find a) 95% and b) 99% confidence intervals for random data taken a normal distribution with unknown mean but known variance of 4 with = 25 and n = 36. a) (25 – 1.96*2/6, 25 + 1.96 * 2/6) or (24.35, 25.65) 95% b) (25 – 2.58*2/6, 25 + 2.58 * 2/6) or (24.14, 25.86) 99% Which is longer? More confidence => longer interval rd
Sample Size n = To halve the length is to quadruple the sample size. When sampling from N(, 9) for a 95% confidence in the mean with an error of ½, ¼ &1, one needs __ samples. n = [(1.96)(3)/0.5]2 = 138.3 ~ 139samples. ½ n = [(1.96)(3)/0.25]2 = 553.2 ~ 554 samples. ¼ n = [(1.96)(3)/1]2 = 34.6 ~ 35 samples. 1 rd
T-Confidence Interval Find a 95% confidence interval for the mean of a normal distribution with unknown variance based on the following 20 samples: 7 9 3 2 3 8 4 6 2 6 4 3 8 3 2 7 9 5 8 8. = 5.35, s2 = 6.35, s = 2.52. Since the sample size is less than 30, we use a t-confidence interval for given by tn-1, /2 * S / 4.47 = 5.35 2.093 * 2.52/4.47 or (4.17, 6.53) with 95% confidence. rd
Difference 2 Means Find a 95% confidence interval for 1 - 2 given that 1 = 10, 2 = 9, 21 =9, 22 = 4, and n1 = n2 = 100 when sampling from two independent normal distributions. 1 - 2 1 - 2 z / 2 1 - 2 (10 - 9) 1.96 * = 1 0.707. rd
Proportion A study revealed 200/500 subjects benefited from a drug. Write a 99% confidence interval about the proportion p p in 0.4 2.575 = 0.4 0.0564 X in 200 2.575 = 200 28.027 rd
Confidence Interval 90% • Let x1, X2, … X11 be a random sample of size 11 from • N(?, ?). If , then find k for a 90% confidence interval for given • 12 – k , 12 + k • (inv-t 10 0.95) 1.812868 • S2 = 99/10 = 9.9; S = 3.146; 111/2 = 3.3166 • k = 1.8129 rd
Standard Normal for Hypothesis Testing Consider the structure of the standard normal for hypothesis testing. z = rd
Type Errors rd
Examples • The following 10 scores are from N(, 144). • 90 78 84 93 99 54 71 77 85 74 (x-bar = 80.5) • Test H0: = 75 vs. H1: 75 at a = 5%. p-value = 14.7% • z = (80.5 – 75)/12 = 1.4494 < 1.96 => Cannot Reject • b) Test H0: = 85 vs. H1: = 85 at a = 5%. p-value = 23.6% • z = (80.5 – 85)/12 = -1.186 > -1.96 => Cannot Reject • c) Test H0: = 88 vs. H1: = 88 at = 5%. p-value = 4.8% • z = (80.5 – 88)/12 = -1.9764 < -1.96 => Reject rd
1. A drill press depth is set at 2”. A random sample of 100 holes are drilled with mean = 2.005 and = 0.03. With a= 0.05, can the hypothesis that m = 2 be rejected? H0: = 2 vs. H1: 2 Test Statistic Z = = = 1.67 < z0975 = 1.96 => Cannot reject; p-value = 0.095 or 9.5%. 2. Given n = 36, = 17, s2 = 9, and a = 0.05 with H0: m= 16 vs. H1: m > 16, can H0 be rejected? Note one-sided test. Large random sample imply adequate test statistic is the value of Z = = = 2 > 1.645 = z0.95. => REJECT H0.p-value is 0.0228 = (U-normal 16 9/36 17) rd
6. a) Find the p-value when the number of heads from 100 flips of a fair coin is between 40 and 60 inclusive using the normal approximation with continuity correction. H0: X = 50 vs. H1: X 50 • b) State the decision if is set at 4%. • c) Find the Type II error if P(heads) = 0.6. • a) = np = 100*0.5 = 50; 2 = npq = 100*0.5*0.5 = 25 => = 5 • P(39.5 <= X <= 60.5) = (del-normal 50 25 39.5 60.5) 0.9642711 • [(60.5 - 50)/5] - [(39.5 - 50)/5] = 0.9821 - 0.0179 = 0.9643 • => p-value = 0.0357. • b) For a set at 4%, the decision is to reject. • (p = 0.6) = P(39.5 <= X <= 60.5 | p = 0.6 => = 60 and = 4.899) = (del-normal 60 24 39.5 60.5) 0.5406. • c) [(60.5 - 60) / 4.899] - [(39.5 - 60) / 4.899] • = 0.5406 - 0.000014 = 0.5406. rd
Determining P-value Degrees of freedom p-value = 0.022 for t = 2.2 with v = 15 v = 0.100 = 0.050 = 0.025 = 0.010 = 0.005 14 1.345 1.761 2.1452.624 2.977 15 1.341 1.753 2.1312.602 2.947 16 1.337 1.746 2.1202.583 2.921 Figure 7.6Partial T-table (Inv-t 15 0.01) 2.60295 rd
Sample Size • Given for testing H0: = 64 vs. H1: = 60 with = 8, find the sample size required for = 0.01 and = 0.05. • n = [8 (-2.326785 + -3.971997)/4]2 = 159 rd
Indicate the and errors in the diagram where xc marks the critical value. H0 H1 POWER xc rd
Proportion • 53% of 871 adults favored strong support. Conclude if majority are for support. • z = (0.53 – 0.5)/](0.53 * 0.47)/871)]1/2 • = 1.77 NO • Data: 3.21 2.49 2.94 4.38 4.02 3.82 3.3 2.85 3.34 3.91 • for mean = 3.5 at = 5%. Test and state assumptions. • (mu-std-err data) (3.426 0.59315) • t = 3.16(3.43 – 3.5)/0.59 = -0.3945; No rd
Chi Square Test • 1 2 3 4 5 6 1 4 9 9 2 5 • Is the die fair at = 5%? • 2 = (16 + 1 + 16 + 16 + 9 + 0)/5 • = 58/5 • = 11.6 • (U-chi-sq 5 11.6) = 0.040699 => reject • (inv-chi-sq 5 0.95) 11.045862 critical 2. rd
Contingency Table • Income Level • Low Medium High • Male 192 225 195 • Female 176 145 120 • The expected value for medium female is ______ • 2 = _____ rd
2 • Let x1, … X4 be a random sample from N(?, 9). Let • S2 = . If P(S2 < K) = 0.05, then K = ? • (n – 1)S2 is 2 or K = (* 3 (inv-chi-sq 3 0.05)) 1.056. • 9 • (chi-sq 3 (/ 1.056 3)) 0.05 rd
Regression • Consider • The levels of X 120 130 140 … • The end levels • Spacing of levels • Number of observations at each level rd
Joint Density • Let X and Y be independent RVs with each having density function f(t) = 1/(2), for - < t < . If V(XY) = 64/9, find . • Joint density is 1/(42). • E(X) = E(Y) = => E(XY) = 0 • and covariance is 0. • Variance(X) = E(X2) = • Thus 4/9 = 64/9 => 4 = 64 => 2 = 8 => = 2 rd
SSError= The sum of the squared differences between the observed Y's and the fitted Y-hat's; also called the sum of the squared residuals. (SSerrorx-data y-data) rd
Distribution of Estimators E ~ N(0, 2) by assumption Y ~ N( + x, 2 ) by assumption B ~ N(ß, 2/Sxx); E(B) = ß; V(B) = 2/Sxx Note: min var B => max Sxx A ~ N(, 2 xi2/nSxx); E(A) = Y-hat ~ N( + x, 2[1/n + (x – )2]/Sxx Ypredict ~ N( + x, 2[1 + 1/n + (x – )2]/Sxx SSError /(n – 2) = s2, an unbiased estimator for 2 rd
B = SxY / Sxx = showing that B is a sum of normal random variables Yi and thus is normal. (B-Y-coefx) returns the coefficients bi. A = – B shows that A is a sum of normal random variables and thus is normal. (A-Y-coefx) returns the coefficients ai. Note that coefficients bi and ai are derived from only the x data. rd
Regression Example 3. Given n = 20, Sxi = 23.92 , SYi = 1843.21, x-bar = 1.196, y-bar = 92.16 SYi2 = 170,044.53, Sxi2 = 29.29,SxiYi = 2,214.66 write the equation of the linear regression model. Ans. Y-hat = 74.26 + 14.97x Sxx = 29.29 - 20*(23.92/20)2 = 0.68168 SxY = 2214.66 - 20 * 23.92 * 1843.21/202 = 10.1808333 A = – B = 92.16 –14.97*1.196 = 74.26 B = 14.97 = SxY /Sxx rd
N(1, 2) N(2, 2) N(3, 2) N(4, 2) N(5, 2) N(6, 2) N(7, 2) N(8, 2) 19 24 12 33 32 19 12 1124 28 12 13 18 21 23 20 22 12 15 33 40 47 33 35 33 35 46 19 23 36 26 17 24 22 29 35 25 23 39 33 15 23 21 24 23 22 25 30 33 35 30 38 12 39 44 29 36 27 41 26 22 26 27 1531 23 16 25 33 22 26 22 39 16 26 31 26 35 20 3054 49 61 40 47 37 50 58 61 57 27 29 34 46 26 58 Ho: i =j vs. H1: i j