1 / 39

BA 555 Practical Business Analysis

BA 555 Practical Business Analysis. Agenda. Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case Study: Cost of Manufacturing Computers Simple Linear Regression. The Empirical Rule (p.5). Review Example.

reidar
Download Presentation

BA 555 Practical Business Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BA 555 Practical Business Analysis Agenda • Review of Statistics • Confidence Interval Estimation • Hypothesis Testing • Linear Regression Analysis • Introduction • Case Study: Cost of Manufacturing Computers • Simple Linear Regression

  2. The Empirical Rule (p.5)

  3. Review Example • Suppose that the average hourly earnings of production workers over the past three years were reported to be $12.27, $12.85, and $13.39 with the standard deviations $0.15, $0.18, and $0.23, respectively. • The average hourly earnings of the production workers in your company also continued to rise over the past three years from $12.72 in 2002, $13.35 in 2003, to $13.95 in 2004. • Assume that the distribution of the hourly earnings for all production workers is mound-shaped. • Do the earnings in your company become less and less competitive? Why or why not.

  4. Review Example

  5. The Empirical Rule • Generalize the results from the empirical rule. • Justify the use of the mound-shaped distribution.

  6. Sampling Distribution (p.6) • The sampling distribution of a statistic is the probability distribution for all possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. • When the sample size is large, what is the sampling distribution of the sample mean / sample proportion / the difference of two samples means / the difference of two sample proportions?  NORMAL !!!

  7. Central Limit Theorem (CLT) (p.6)

  8. Central Limit Theorem (CLT) (p.6)

  9. Summary: Sampling Distributions • The sampling distribution of a sample mean • The sampling distribution of a sample proportion • The sampling distribution of the difference between two sample means • The sampling distribution of the difference between two sample proportions

  10. Standard Deviations

  11. Statistical Inference: Estimation Population Research Question: What is the parameter value? Sample of size n Tools (i.e., formulas): Point Estimator Interval Estimator

  12. Confidence Interval Estimation (p.7)

  13. Example 1: Estimation for the population mean • A random sampling of a company’s weekly operating expenses for a sample of 48 weeks produced a sample mean of $5474 and a standard deviation of $764. Construct a 95% confidence interval for the company’s mean weekly expenses. Example 2: Estimation for the population proportion

  14. Statistical Inference: Hypothesis Testing Population Research Question: Is the claim supported? Sample of size n Tools (i.e., formulas): z or t statistic

  15. Hypothesis Testing (p.9)

  16. Example • A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 100 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes?

  17. Setting Up the Rejection RegionType I Error • If we reject H0 (accept Ha) when in fact H0 is true, this is a Type I error. • False Alarm.

  18. The P-Value of a Test (p.11) • The p-value or observed significance level is the smallest value of a for which test results are statistically significant. “the conclusion of rejecting H0 can be reached.”

  19. Regression Analysis • A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X1, X2, … Xk). • The model allows us to understand (quantify) the effect of each X on Y. • It also allows us to predict Y based on X1, X2, …. Xk.

  20. Types of Relationship • Linear Relationship • Simple Linear Relationship • Y = b0 + b1 X + e • Multiple Linear Relationship • Y = b0 + b1 X1 + b2 X2 + … + bk Xk + e • Nonlinear Relationship • Y = a0 exp(b1X+e) • Y = b0 + b1 X1 + b2 X12 + e • … etc. • Will focus only on linear relationship.

  21. Simple Linear Regression Model population True effect of X on Y Estimated effect of X on Y sample Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?

  22. Least Squares Method • Least squares line: • It is a statistical procedure for finding the “best-fitting” straight line. • It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Bad fit. Deviations are minimized.

  23. Case: Cost of Manufacturing Computers (pp.13 – 45) • A manufacturer produces computers. The goal is to quantify cost drivers and to understand the variation in production costs from week to week. • The following production variables were recorded: • COST: the total weekly production cost (in $millions) • UNITS: the total number of units (in 000s) produced during the week. • LABOR: the total weekly direct labor cost (in $10K). • SWITCH: the total number of times that the production process was re-configured for different types of computers • FACTA: = 1 if the observation is from factory A; = 0 if from factory B.

  24. Raw Data (p. 14) How many possible regression models can we build?

  25. Simple Linear Regression Model (pp. 17 – 26) • Question1: Is Labor a significant cost driver? • This question leads us to think about the following model: Cost = f(Labor) + e. Specifically, Cost = b0 + b1 Labor + e • Question 2: How well does this model perform? (How accurate can Labor predict Cost?) • This question leads us to try other regression models and make comparison.

  26. Initial Analysis (pp. 15 – 16) • Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations • Things to look for • Features of Data (e.g., data range, outliers) • do not want to extrapolate outside data range because the relationship is unknown (or un-established). • Summary statistics and graphs. • Is the assumption of linearity appropriate? • Inter-dependence among variables? Any potential problem? • Scatter plots and correlations.

  27. Correlation (p. 15) • r (rho): Population correlation (its value most likely is unknown.) • r: Sample correlation (its value can be calculated from the sample.) • Correlation is a measure of the strength of linear relationship. • Correlation falls between –1 and 1. • No linear relationship if correlation is close to 0. But, …. r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1

  28. Correlation (p. 15) Sample size P-value for H0: r = 0 Ha: r≠ 0 Is 0.9297 a r or r?

  29. Fitted Model (Least Squares Line) (p.18) b0 or b0? b0 b1 or b1? b1 Sb0 H0: b1 = 0 Ha: b1≠ 0 Sb1 ** Divide the p-value by 2 for one-sided test. Make sure there is at least weak evidence for doing this step. Degrees of freedom = n – k – 1, where n = sample size, k = # of Xs.

  30. Hypothesis Testing and Confidence Interval Estimation for b (pp. 19 – 20) Q1: Does Labor have any impact on Cost → Hypothesis Testing Q2: If so, how large is the impact? → Confidence Interval Estimation b1 b0 Sb1 Sb0 Degrees of freedom = n – k – 1 k = # of independent variables

  31. Analysis of Variance (p. 21) - Not very useful in simple regression. - Useful in multiple regression.

  32. Sum of Squares (p.22) SSE = remaining variation that can not be explained by the model. Syy = Total variation in Y SSR = Syy – SSE = variation in Y that has been explained by the model.

  33. Fit Statistics (pp. 23 – 24) 0.45199 x 0.45199 = 0.204295

  34. Prediction (pp. 25 – 26) • What is the predicted production cost of a given week, say, Week 21 of the year that Labor = 5 (i.e., $50,000)? • Point estimate: predicted cost = b0 + b1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). • Margin of error? → Prediction Interval • What is the average production cost of a typical week that Labor = 5? • Point estimate: estimated cost = b0 + b1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). • Margin of error? → Confidence Interval

  35. Prediction vs. Confidence Intervals (pp. 25 – 26) ☻ ☻ ☻ ☻ ☻ ☻ ☺ ☺ ☺ ☺ ☺ ☺ Variation (margin of error) on both ends seems larger. Implication?

  36. Another Simple Regression Model: Cost = b0 + b1 Units + e (p. 27) A better model? Why?

  37. Statgraphics • Simple Regression Analysis • Relate / Simple Regression • X = Independent variable, Y = dependent variable • For prediction, click on the Tabular option icon and check Forecasts. Right click to change X values. • Multiple Regression Analysis • Relate / Multiple Regression • For prediction, enter values of Xs in the Data Window and leave the corresponding Y blank. Click on the Tabular option icon and check Reports.

  38. Normal Probabilities

  39. Critical Values of t

More Related