520 likes | 703 Views
Regression Analysis Simple Regression. y = mx + b. y = a + bx. y = a + bx. where: y dependent variable (value depends on x) a y-intercept (value of y when x = 0) b slope (rate of change in ratio of delta y divided by delta x) x independent variable. Assumptions. Linearity
E N D
y = mx + b y = a + bx
y = a + bx where: ydependent variable(value depends on x) ay-intercept(value of y when x = 0) bslope (rate of change in ratio of delta y divided by delta x) x independentvariable
Assumptions Linearity Independence of Error Homoscedasticity Normality
Linearity The most fundamental assumption is that the model fits the situation [i.e.: the Y variable is linearly related to the value of the X variable].
Independence of Error The error (residual) is independent for each value of X. [Residual = observed - predicted]
Homoscedasticity The variation around the line of regression constant for all values of X.
Normality The values of Y be normally distributed at each value of X.
Linearity Independence Examine scatter plot of residuals versus fitted [Yhat] for evidence of nonlinearity Plot residuals in time order and look for patterns Diagnostic Checking
Homoscedasticity Normality Examine scatter plots of residuals versus fitted [Yhat] and residuals vs time order and look for changing scatter. Examine histogram of residuals. Look for departures from normal curve. Diagnostic Checking
Goal Develop a statistical model that can predict the values of a dependent (response) variable based upon the values of the independent (explanatory) variable(s).
Simple Regression A statistical model that utilizes onequantitativeindependent variable “X” to predict the quantitativedependent variable “Y.”
Mini-Case Since a new housing complex is being developed in Carmichael, management is under pressure to open a new pie restaurant. Assuming that population and annual sales are related, a study was conducted to predict expected sales.
Mini-Case • What preliminary conclusions can management draw from the data? • What could management expect sales to be if population of the new complex is approximately 18,000 people?
Scatter Diagrams • The values are plotted on a two-dimensional graph called a “scatter diagram.” • Each value is plotted at its X and Y coordinates.
Types of Models No relationship between X and Y Positive linear relationship Negative linear relationship
Method of Least Squares • The straight line that best fits the data. • Determine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.
Measures of Variation Explained Unexplained Total
Explained Variation Sum of Squares (Yhat - Ybar)2 due to Regression [SSR]
Unexplained Variation Sum of Squares (Yobs - Yhat)2 Error [SSE]
Total Variation Sum of Squares (Yobs - Ybar)2 Total [SST]
H0: There is no linear relationship between the dependent variable and the explanatory variable
Hypotheses H0: = 0 H1: 0 or H0: No relationship exists H1: A relationship exists
Standard Error of the Estimate sy.x -the measure of variability around the line of regression
Relationship When null hypothesis is rejected, a relationship between Y and X variables exists.
Coefficient of Determination R2 measures the proportion of variation that is explained by the independent variable in the regression model. R2 = SSR / SST
Confidence interval estimates • True mean YX • Individual Y-hat
Diagnostic Checking • H0retain or reject {Reject if p-value 0.05} • R2 (larger is “better”) • sy.x (smaller is “better”)
Coefficient of Determination R2 = SSR / SST = 90.27 % thus, 90.27 percent of the variation in annual sales is explained by the population.
Standard Error of the Estimate sy.x = 13.8293 with SSE = 1,530.0
Regression Analysis[Simple Regression] *** End of Presentation *** Questions?