390 likes | 518 Views
BA 555 Practical Business Analysis. Agenda. Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case Study: Cost of Manufacturing Computers Simple Linear Regression. The Empirical Rule (p.5). Review Example.
E N D
BA 555 Practical Business Analysis Agenda • Review of Statistics • Confidence Interval Estimation • Hypothesis Testing • Linear Regression Analysis • Introduction • Case Study: Cost of Manufacturing Computers • Simple Linear Regression
Review Example • Suppose that the average hourly earnings of production workers over the past three years were reported to be $12.27, $12.85, and $13.39 with the standard deviations $0.15, $0.18, and $0.23, respectively. • The average hourly earnings of the production workers in your company also continued to rise over the past three years from $12.72 in 2002, $13.35 in 2003, to $13.95 in 2004. • Assume that the distribution of the hourly earnings for all production workers is mound-shaped. • Do the earnings in your company become less and less competitive? Why or why not.
The Empirical Rule • Generalize the results from the empirical rule. • Justify the use of the mound-shaped distribution.
Sampling Distribution (p.6) • The sampling distribution of a statistic is the probability distribution for all possible values of the statistic that results when random samples of size n are repeatedly drawn from the population. • When the sample size is large, what is the sampling distribution of the sample mean / sample proportion / the difference of two samples means / the difference of two sample proportions? NORMAL !!!
Summary: Sampling Distributions • The sampling distribution of a sample mean • The sampling distribution of a sample proportion • The sampling distribution of the difference between two sample means • The sampling distribution of the difference between two sample proportions
Statistical Inference: Estimation Population Research Question: What is the parameter value? Sample of size n Tools (i.e., formulas): Point Estimator Interval Estimator
Example 1: Estimation for the population mean • A random sampling of a company’s weekly operating expenses for a sample of 48 weeks produced a sample mean of $5474 and a standard deviation of $764. Construct a 95% confidence interval for the company’s mean weekly expenses. Example 2: Estimation for the population proportion
Statistical Inference: Hypothesis Testing Population Research Question: Is the claim supported? Sample of size n Tools (i.e., formulas): z or t statistic
Example • A bank has set up a customer service goal that the mean waiting time for its customers will be less than 2 minutes. The bank randomly samples 30 customers and finds that the sample mean is 100 seconds. Assuming that the sample is from a normal distribution and the standard deviation is 28 seconds, can the bank safely conclude that the population mean waiting time is less than 2 minutes?
Setting Up the Rejection RegionType I Error • If we reject H0 (accept Ha) when in fact H0 is true, this is a Type I error. • False Alarm.
The P-Value of a Test (p.11) • The p-value or observed significance level is the smallest value of a for which test results are statistically significant. “the conclusion of rejecting H0 can be reached.”
Regression Analysis • A technique to examine the relationship between an outcome variable (dependent variable, Y) and a group of explanatory variables (independent variables, X1, X2, … Xk). • The model allows us to understand (quantify) the effect of each X on Y. • It also allows us to predict Y based on X1, X2, …. Xk.
Types of Relationship • Linear Relationship • Simple Linear Relationship • Y = b0 + b1 X + e • Multiple Linear Relationship • Y = b0 + b1 X1 + b2 X2 + … + bk Xk + e • Nonlinear Relationship • Y = a0 exp(b1X+e) • Y = b0 + b1 X1 + b2 X12 + e • … etc. • Will focus only on linear relationship.
Simple Linear Regression Model population True effect of X on Y Estimated effect of X on Y sample Key questions: 1. Does X have any effect on Y? 2. If yes, how large is the effect? 3. Given X, what is the estimated Y?
Least Squares Method • Least squares line: • It is a statistical procedure for finding the “best-fitting” straight line. • It minimizes the sum of squares of the deviations of the observed values of Y from those predicted Bad fit. Deviations are minimized.
Case: Cost of Manufacturing Computers (pp.13 – 45) • A manufacturer produces computers. The goal is to quantify cost drivers and to understand the variation in production costs from week to week. • The following production variables were recorded: • COST: the total weekly production cost (in $millions) • UNITS: the total number of units (in 000s) produced during the week. • LABOR: the total weekly direct labor cost (in $10K). • SWITCH: the total number of times that the production process was re-configured for different types of computers • FACTA: = 1 if the observation is from factory A; = 0 if from factory B.
Raw Data (p. 14) How many possible regression models can we build?
Simple Linear Regression Model (pp. 17 – 26) • Question1: Is Labor a significant cost driver? • This question leads us to think about the following model: Cost = f(Labor) + e. Specifically, Cost = b0 + b1 Labor + e • Question 2: How well does this model perform? (How accurate can Labor predict Cost?) • This question leads us to try other regression models and make comparison.
Initial Analysis (pp. 15 – 16) • Summary statistics + Plots (e.g., histograms + scatter plots) + Correlations • Things to look for • Features of Data (e.g., data range, outliers) • do not want to extrapolate outside data range because the relationship is unknown (or un-established). • Summary statistics and graphs. • Is the assumption of linearity appropriate? • Inter-dependence among variables? Any potential problem? • Scatter plots and correlations.
Correlation (p. 15) • r (rho): Population correlation (its value most likely is unknown.) • r: Sample correlation (its value can be calculated from the sample.) • Correlation is a measure of the strength of linear relationship. • Correlation falls between –1 and 1. • No linear relationship if correlation is close to 0. But, …. r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1 r = –1 –1 < r < 0 r = 0 0 < r < 1 r = 1
Correlation (p. 15) Sample size P-value for H0: r = 0 Ha: r≠ 0 Is 0.9297 a r or r?
Fitted Model (Least Squares Line) (p.18) b0 or b0? b0 b1 or b1? b1 Sb0 H0: b1 = 0 Ha: b1≠ 0 Sb1 ** Divide the p-value by 2 for one-sided test. Make sure there is at least weak evidence for doing this step. Degrees of freedom = n – k – 1, where n = sample size, k = # of Xs.
Hypothesis Testing and Confidence Interval Estimation for b (pp. 19 – 20) Q1: Does Labor have any impact on Cost → Hypothesis Testing Q2: If so, how large is the impact? → Confidence Interval Estimation b1 b0 Sb1 Sb0 Degrees of freedom = n – k – 1 k = # of independent variables
Analysis of Variance (p. 21) - Not very useful in simple regression. - Useful in multiple regression.
Sum of Squares (p.22) SSE = remaining variation that can not be explained by the model. Syy = Total variation in Y SSR = Syy – SSE = variation in Y that has been explained by the model.
Fit Statistics (pp. 23 – 24) 0.45199 x 0.45199 = 0.204295
Prediction (pp. 25 – 26) • What is the predicted production cost of a given week, say, Week 21 of the year that Labor = 5 (i.e., $50,000)? • Point estimate: predicted cost = b0 + b1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). • Margin of error? → Prediction Interval • What is the average production cost of a typical week that Labor = 5? • Point estimate: estimated cost = b0 + b1 (5) = 1.0867 + 0.0081 (5) = 1.12724 (million dollars). • Margin of error? → Confidence Interval
Prediction vs. Confidence Intervals (pp. 25 – 26) ☻ ☻ ☻ ☻ ☻ ☻ ☺ ☺ ☺ ☺ ☺ ☺ Variation (margin of error) on both ends seems larger. Implication?
Another Simple Regression Model: Cost = b0 + b1 Units + e (p. 27) A better model? Why?
Statgraphics • Simple Regression Analysis • Relate / Simple Regression • X = Independent variable, Y = dependent variable • For prediction, click on the Tabular option icon and check Forecasts. Right click to change X values. • Multiple Regression Analysis • Relate / Multiple Regression • For prediction, enter values of Xs in the Data Window and leave the corresponding Y blank. Click on the Tabular option icon and check Reports.