1 / 64

Statistical Analysis Review: ANOVA, Factor Investigation, and Estimators

Analyzing a complex statistical study involving ANOVA, factor variables, treatments, multi-factor experiments, estimators, and sample size calculations. Explore means, variances, unbiased estimators, and confidence intervals.

ljoseph
Download Presentation

Statistical Analysis Review: ANOVA, Factor Investigation, and Estimators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Final ENM 500 Review rd

  2. ANOVA • Factor – independent variable under investigation Price • Factor Level $100 $200 $300 Price has 3 levels • Single Factor Price • Multi-factor Price and Location on Sales • Treatment in one-way ~ level • Treatment in multi-factor ~ cell combination • Each population is N(i, 2) rd

  3. 10 B-low B-high 50 A B AB (1) 2 6 b (1) - - +A-low4 8 a + - - a 2 1 ab b - + - 50A-high 4 5 ab + + + (anova '((2 4 6 8)(2 4 1 5)) 2)  Aeff = (3+3-7-3)/2 = -2Beff = (-3-3+7+3)/2 = 2 ABeff = (3-3 -7+3)/2 = -2 Source SS dfMS F p-value Coded VariablesRows A 8 1 8 2.28 0.1554 50 - ½ (50 + 10)/20 = 1 Columns B 8 1 8 2.28 0.1554 10 – ½ (50 + 10)/20 = -1RC AB 8 1 8 2.28 0.1554 20 = ½ (50 – 10)Error 14 4 3.5 SSA = 16Total 38 7 (Y-hat '((-1 -1 1 1 -1 -1 1 1)(-1 -1 -1 -1 1 1 1 1)(1 1 -1 -1 -1 -1 1 1)) '(2 4 2 4 6 8 1 5)) Y-hat = 4 - 1X1 + 1X2 - 1X1X2 4 is grand mean R2 = 24/38 = 63.15% rd

  4. 2 64 8 R1 = 5 2 1 4 5 R2 = 3 C1 = 3 C2 = 5 X-bar-bar = 4 (anova '((2 4 2 4)(6 8 1 5)) 2) Source SS df MS F p-valueRows 8 1 8 2.28 0.1554Columns 8 1 8 2.28 0.1554RC 8 1 8 2.28 0.1554Error 14 4 3.5Total 38 7 Rows = 4[5-4)2 + (3 – 4)2 = 8 RC = 2[3 - 5 – 3 + 4)2 + (7 – 5 - 5 + 4)2 +( 3 – 3 - 3 + 4)2 + (3 – 3 – 5 + 4)2 2[1 + 1 + 1 + 1] = 8 Error = (1 + 1 + 1 + 1 + 1 + 1 + 4 + 4) = 14 Cell = 2[(3 - 4)2 + (7 -4)2 + (3 – 4)2 + (5 – 4)2 = 24 rd

  5. 2 x 2 ANOVA • 1B2__ • 4 5 3 4 • A 1 6 5 7 6 • 2 5 7 10 6 • 5 3 4 4 • 1. Assume one way and compute Within variance of B3 and B4 • 2. Assume 2 x 2 and compute Between SS for A1 • 3. Assume 2 x 2 and compute Interaction SS for A1B1. • 4. Assume 2 x 2 and compute the Row effects using a contrast. rd

  6. Sample Size • How large a sample is needed for a 95% confidence interval for  with  = 0.3 for an error less than 0.05? • Define the most efficient estimator. • The unbiased estimator with the minimum variance. rd

  7. Exponential • Find P(X-bar < 0.51) given 49 random samples from f(x) = 2e-2x, x > 0. • Write the MM estimator for k. • Write the MM estimate for k given x-bar = 0.51 • Find the MME for a Poisson parameter k • Solution: • (normal 1/2 1/196 0.51) 0.55567 • k-hatMME =1/x-bar • d) k-hat = x-bar rd

  8. Binomial • Show that p-hat = x/n is an unbiased estimator for p and write the variance and standard error of p-hat. • E(p-hat) = E(x/n) = np/n = p • V(p-hat) = V(x/n) = npq/n2 = pq/n • Sp-hat = (pq/n)1/2 rd

  9. P-value • RV X ~ N(15, 36). A sample 36 revealed an x-bar of 20. • Find the p-value for testing H0: = 15 vs. H1:   15. • (* 2 (U-normal 15 36/36 20)) 5.960464e-7 • State sanity check. rd

  10. Random Samples & IIDs •   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 • (swr 10 (sim-binomial 100 1/2 50)) (45 44 54 47 48 45 47 56 49 48) • (swr 10 (sim-binomial 100 1/2 50))  (53 43 48 51 39 49 41 49 51 46) • ……………………………………………………………………………………… • (swr 10 (sim-binomial 100 1/2 50))  (48 48 53 55 49 54 53 41 46 41) • (swr 10 (sim-binomial 100 1/2 50))  (51 50 58 44 54 49 50 51 50 57)(swr 10 (sim-binomial 100 1/2 50))  (50 61 36 55 50 43 55 50 50 52)(swr 10 (sim-binomial 100 1/2 50))  (49 50 52 46 55 51 46 51 50 52) rd

  11. Estimators • Point mean median mode • Unbiased & Variance => most efficient • Interval seek short and high degree of confidence • 95% confident that  is within [5 6] is better than • 99% confident that  is within [2 9] •   1.96 • x-bar  ks rd

  12. Bias Estimator E( ) =  => is unbiased Show that 2 is a biased estimator for µ2. E(X2) = V(X) + [E(X)]2 E( 2) = 2/n + µ2 Bias = 2/n and decreases with increasing n. rd

  13. Bias Estimator E(theta-hat) =  => theta-hat is unbiased Show that x-bar2 is a biased estimator for µ2. E(X2) = V(X) + [E(X)]2 E(x-bar2) = 2/n + µ2 Bias = 2/n and decreases with increasing n. rd

  14. Mean Square Error • Compare relative efficiency using x-bar and a single sample from N(?, 2) • (2/n)/2 = 1/n => large samples are desirable for estimating. rd

  15. Asymptototic Normal 1. Let RV X have density below. Find mean and variance of X, and for sample size of 36. Find P( < 5.5). X 4 5 6 7 P(X = x) 0.2 0.4 0.3 0.1 E(X) = 5.3; E(X2) = 28.9; V(X) = E(X2) – E2(X) = 0.81 ~ N(5.3, 0.81/36 = 0.0225) => P( < 5.5) = 0.91. (L-normal 5.3 81/3600 5.5) 0.9087886 rd

  16. Sample • In a random sample of size 100 from the continuous uniform on [2, 52], find the P( < 26). • ~ N(27, 2.083 ) where V( ) = 502/(12*100) • P( < 26) =(L-normal 27 2500/1200 26) = 0.2442111 • 3. Find the probability that exceeds 5/8 from an exponential sample size of 49 with k = 2. • ~ N[1/2, 1/(4*49)] • (U-normal 1/2 (/ 1 (* 4 49)) 5/8)  0.0400592 rd

  17. Population and Sample Moments • Population Moments Sample Moments • E(X) • E(X2) • E(X3) • ... … rd

  18. MME for N(, 2) • Express the parameter  in terms of population moments. • = E(X) Done. Substitute the sample moment for the population moment. • 2 = E(X2) – E2(X) • mu-hat = x-bar rd

  19. MME • 4. Find the MME for given f(x) = e- x • E(X) = • E(X) = E(1/theta) = 1/ x-bar • Express the parameter of interest in terms of the population moments. Then substitute the sample moments for the population moments. rd

  20. Maximum Likelihood Estimators Random Sample is X1, X2, …Xn Likelihood function is the joint density product iid => Independent and identically distributed Find estimator which maximizes the joint density Suppose samples are from Bernoulli densities. Then L = px1(1 – p)1-x1 * px2(1 – p)1-x2…* … = Ln L = Ln p + (n - )Ln (1 – p) (Ln L)’ = / p – (n - ) / (1 – p) = 0 => rd

  21. 6. Consider RV X with density given by f(x; ) = ( + 1)x on [0, 1];  > -1. A random sample of size 10 is: 0.89 0.33 0.92 0.95 0.86 0.68 0.56 0.91 0.53 0.90 from which x-bar = 0.753. Find both the MME and the MLE for  and compute the estimates for  from the data. MME: 2.05 MLE: L(xi; ) = ( + 1)n Ln L(xi; ) = n Ln ( + 1) + Ln(xi) (Ln L)’ = n / ( + 1) + Ln(xi) = 0 when (see below) = 2.27 Data taken from the density f(x) = 3x2 with  = 2. rd

  22. MLE vs. MME Compare the MME with MLE for estimating  given a random sample from the density f(x) = 1 /  on [0, ]. MME: E(X) =  /2 => MLE: L(xi, ) = 1 / n To maximize L, we seek to make  as small as possible but  must be larger than all the Xi => = max {Xi}. rd

  23. Point Estimates • E(X1) = , V(X1) = 4, E(X2) = , V(X2) = 6 • a) Compute V(X1/2 + X2/2) = 4/4 + 6/4 = 5/2 = 2.5 • b) Find p that minimizes V(pX1 + (1 – p)X2) • V(pX1 + (1 – p)X2) = 4p2 + 6(1 – p)2 • V' = 8p -12 (1 – p) = 0 when p = 3/5 rel min • V = 4(9/25) + 6(4/25) = 60/25 = 2.4 • Note that if E[pX1 + (1 – p)X2] =  => • p + (1 – p) = 1 to make the estimator unbiased • c) Find relative efficiency of -hat1 to the point estimate with the smallest variance. 2.4/2.5 = 0.96 rd

  24. Confidence Intervals z/2 < Z < z1 – /2 z/2 < < z1 – /2   ( ) rd

  25. Interval Length = 6.0301 µ = 25  = 20 n = 20 50% Conf Intervals _______ (28.9262, 34.9563) _______ (22.2692, 28.2993) _______ (24.4427, 30.4729) _______ (17.7173, 23.7474) _______ (20.2293, 26.2594) _______ (22.8699, 28.9000) _______ (28.3861, 34.4163) _______ (21.7262, 27.7563) _______ (17.6309, 23.6610) _______ (15.1510, 21.1811) (sim-plot-ci 25 20 20 10 50) rd

  26. Example Find a) 95% and b) 99% confidence intervals for random data taken a normal distribution with unknown mean but known variance of 4 with = 25 and n = 36. a) (25 – 1.96*2/6, 25 + 1.96 * 2/6) or (24.35, 25.65) 95% b) (25 – 2.58*2/6, 25 + 2.58 * 2/6) or (24.14, 25.86) 99% Which is longer? More confidence => longer interval rd

  27. Sample Size n = To halve the length is to quadruple the sample size. When sampling from N(, 9) for a 95% confidence in the mean  with an error of ½, ¼ &1, one needs __ samples. n = [(1.96)(3)/0.5]2 = 138.3 ~ 139samples. ½ n = [(1.96)(3)/0.25]2 = 553.2 ~ 554 samples. ¼ n = [(1.96)(3)/1]2 = 34.6 ~ 35 samples. 1 rd

  28. T-Confidence Interval Find a 95% confidence interval for the mean of a normal distribution with unknown variance based on the following 20 samples: 7 9 3 2 3 8 4 6 2 6 4 3 8 3 2 7 9 5 8 8.   = 5.35, s2 = 6.35, s = 2.52. Since the sample size is less than 30, we use a t-confidence interval for  given by  tn-1,  /2 * S / 4.47 = 5.35  2.093 * 2.52/4.47 or  (4.17, 6.53) with 95% confidence. rd

  29. Difference 2 Means Find a 95% confidence interval for 1 - 2 given that 1 = 10, 2 = 9,  21 =9,  22 = 4, and n1 = n2 = 100 when sampling from two independent normal distributions. 1 - 2  1 - 2  z / 2 1 - 2  (10 - 9)  1.96 * = 1  0.707. rd

  30. Proportion A study revealed 200/500 subjects benefited from a drug. Write a 99% confidence interval about the proportion p p in 0.4  2.575 = 0.4  0.0564 X in 200  2.575 = 200  28.027 rd

  31. Confidence Interval 90% • Let x1, X2, … X11 be a random sample of size 11 from • N(?, ?). If , then find k for a 90% confidence interval for  given • 12 – k , 12 + k • (inv-t 10 0.95) 1.812868 • S2 = 99/10 = 9.9; S = 3.146; 111/2 = 3.3166 • k = 1.8129 rd

  32. Standard Normal for Hypothesis Testing Consider the structure of the standard normal for hypothesis testing. z = rd

  33. Type Errors rd

  34. Examples • The following 10 scores are from N(, 144). • 90 78 84 93 99 54 71 77 85 74 (x-bar = 80.5) • Test H0:  = 75 vs. H1:   75 at a = 5%. p-value = 14.7% • z = (80.5 – 75)/12 = 1.4494 < 1.96 => Cannot Reject • b) Test H0:  = 85 vs. H1:  =  85 at a = 5%. p-value = 23.6% • z = (80.5 – 85)/12 = -1.186 > -1.96 => Cannot Reject • c) Test H0:  = 88 vs. H1:  =  88 at  = 5%. p-value = 4.8% • z = (80.5 – 88)/12 = -1.9764 < -1.96 => Reject rd

  35. 1. A drill press depth is set at 2”. A random sample of 100 holes are drilled with mean = 2.005 and  = 0.03. With a= 0.05, can the hypothesis that m = 2 be rejected?  H0:  = 2 vs. H1:  2 Test Statistic Z = = = 1.67 < z0975 = 1.96 => Cannot reject; p-value = 0.095 or 9.5%.  2. Given n = 36, = 17, s2 = 9, and a = 0.05 with H0: m= 16 vs. H1: m > 16, can H0 be rejected? Note one-sided test.  Large random sample imply adequate test statistic is the value of   Z = = = 2 > 1.645 = z0.95. => REJECT H0.p-value is 0.0228 = (U-normal 16 9/36 17) rd

  36. 6. a) Find the p-value when the number of heads from 100 flips of a fair coin is between 40 and 60 inclusive using the normal approximation with continuity correction. H0: X = 50 vs. H1: X  50 •  b) State the decision if  is set at 4%. •   c) Find the Type II error if P(heads) = 0.6. •   a)  = np = 100*0.5 = 50;  2 = npq = 100*0.5*0.5 = 25 =>  = 5 • P(39.5 <= X <= 60.5) = (del-normal 50 25 39.5 60.5)  0.9642711 • [(60.5 - 50)/5] - [(39.5 - 50)/5] = 0.9821 - 0.0179 = 0.9643 • => p-value = 0.0357. •   b) For a set at 4%, the decision is to reject. • (p = 0.6) = P(39.5 <= X <= 60.5 | p = 0.6 =>  = 60 and  = 4.899) = (del-normal 60 24 39.5 60.5)  0.5406. • c) [(60.5 - 60) / 4.899] - [(39.5 - 60) / 4.899] • = 0.5406 - 0.000014 = 0.5406. rd

  37. Determining P-value Degrees of freedom p-value = 0.022 for t = 2.2 with v = 15 v = 0.100  = 0.050  = 0.025  = 0.010  = 0.005 14 1.345 1.761 2.1452.624 2.977 15 1.341 1.753 2.1312.602 2.947 16 1.337 1.746 2.1202.583 2.921 Figure 7.6Partial T-table (Inv-t 15 0.01) 2.60295 rd

  38. Sample Size • Given for testing H0:  = 64 vs. H1:  = 60 with  = 8, find the sample size required for  = 0.01 and  = 0.05. • n = [8 (-2.326785 + -3.971997)/4]2 = 159 rd

  39. Indicate the  and  errors in the diagram where xc marks the critical value. H0 H1 POWER xc rd

  40. Proportion • 53% of 871 adults favored strong support. Conclude if majority are for support. • z = (0.53 – 0.5)/](0.53 * 0.47)/871)]1/2 • = 1.77 NO • Data: 3.21 2.49 2.94 4.38 4.02 3.82 3.3 2.85 3.34 3.91 • for mean = 3.5 at  = 5%. Test and state assumptions. • (mu-std-err data) (3.426 0.59315) • t = 3.16(3.43 – 3.5)/0.59 = -0.3945; No rd

  41. Chi Square Test • 1 2 3 4 5 6 1 4 9 9 2 5 • Is the die fair at  = 5%? • 2 = (16 + 1 + 16 + 16 + 9 + 0)/5 • = 58/5 • = 11.6 • (U-chi-sq 5 11.6) = 0.040699 => reject • (inv-chi-sq 5 0.95) 11.045862 critical 2. rd

  42. Contingency Table • Income Level • Low Medium High • Male 192 225 195 • Female 176 145 120 • The expected value for medium female is ______ • 2 = _____ rd

  43. 2 • Let x1, … X4 be a random sample from N(?, 9). Let • S2 = . If P(S2 < K) = 0.05, then K = ? • (n – 1)S2 is 2 or K = (* 3 (inv-chi-sq 3 0.05)) 1.056. • 9 • (chi-sq 3 (/ 1.056 3)) 0.05 rd

  44. Regression • Consider • The levels of X 120 130 140 … • The end levels • Spacing of levels • Number of observations at each level rd

  45. Joint Density • Let X and Y be independent RVs with each having density function f(t) = 1/(2), for - < t < . If V(XY) = 64/9, find . • Joint density is 1/(42). • E(X) = E(Y) = => E(XY) = 0 • and covariance is 0. • Variance(X) = E(X2) = • Thus 4/9 = 64/9 => 4 = 64 => 2 = 8 => = 2 rd

  46. SSError= The sum of the squared differences between the observed Y's and the fitted Y-hat's; also called the sum of the squared residuals. (SSerrorx-data y-data) rd

  47. Distribution of Estimators E ~ N(0, 2) by assumption Y ~ N( + x, 2 ) by assumption B ~ N(ß, 2/Sxx); E(B) = ß; V(B) = 2/Sxx Note: min var B => max Sxx A ~ N(, 2 xi2/nSxx); E(A) =  Y-hat ~ N( + x, 2[1/n + (x – )2]/Sxx Ypredict ~ N( + x, 2[1 + 1/n + (x – )2]/Sxx SSError /(n – 2) = s2, an unbiased estimator for 2 rd

  48. B = SxY / Sxx = showing that B is a sum of normal random variables Yi and thus is normal. (B-Y-coefx) returns the coefficients bi. A = – B shows that A is a sum of normal random variables and thus is normal. (A-Y-coefx) returns the coefficients ai. Note that coefficients bi and ai are derived from only the x data. rd

  49. Regression Example 3. Given n = 20, Sxi = 23.92 , SYi = 1843.21, x-bar = 1.196, y-bar = 92.16 SYi2 = 170,044.53, Sxi2 = 29.29,SxiYi = 2,214.66  write the equation of the linear regression model.  Ans. Y-hat = 74.26 + 14.97x Sxx = 29.29 - 20*(23.92/20)2 = 0.68168 SxY = 2214.66 - 20 * 23.92 * 1843.21/202 = 10.1808333 A = – B = 92.16 –14.97*1.196 = 74.26 B = 14.97 = SxY /Sxx rd

  50. N(1, 2) N(2, 2) N(3, 2) N(4, 2) N(5, 2) N(6, 2) N(7, 2) N(8, 2) 19 24 12 33 32 19 12 1124 28 12 13 18 21 23 20 22 12 15 33 40 47 33 35 33 35 46 19 23 36 26 17 24 22 29 35 25 23 39 33 15 23 21 24 23 22 25 30 33 35 30 38 12 39 44 29 36 27 41 26 22 26 27 1531 23 16 25 33 22 26 22 39 16 26 31 26 35 20 3054 49 61 40 47 37 50 58 61 57 27 29 34 46 26 58 Ho: i =j vs. H1: i j

More Related