170 likes | 324 Views
Basic Review of Statistics. By this point in your college career, the BB students should have taken STAT 171 and perhaps DS 303/ ECON 387 (core requirements for the BB degree).
E N D
Basic Review of Statistics By this point in your college career, the BB students should have taken STAT 171 and perhaps DS 303/ ECON 387 (core requirements for the BB degree). For the BA students, deficiency MA students, and those of you that haven’t completed your statistics requirements we will overview the key topics necessary for applications directly related to Econ 330.
Population Parameters vs. Sample Statistics Population Parameters: descriptive measures of the entire population that you’re interested in examining • Ex: All US households • Ex: All Illinois households with m > $25,000 • In the absence of complete and detailed information on every household you are interested in you must estimate the population parameters. Most common way is using sample statistics. Sample Statistics: descriptive measures of a representative sample, or subset, of the population. • Ex: instead of surveying every US household we send out surveys to a subset of the population and use that basic information to estimate what the values would be for the overall population.
Measures of Central Tendency • Mean (or “arithmetic mean” or “average”): the sum of numbers included in the sample divided by the number of observations, n. • Ex: calculate the average cost per unit (AC) across different firms given cost data: $20.6, $40.3, $15.8, $23.7 • Typically written as: • Limitation of the Mean: because it is only an average, you can expect that actual data will rarely coincide exactly with your estimate. If there is high variation in your data the average may not be very useful in estimation.
Measures of Central Tendency continued 2. Median: is the middle observation in your data. • Indicates that half of your observations are above this value and half of your observations are below this value • to find the value of the median, rank in ascending or descending order your observations by value. The observation in the middle is the median. • Ex: 40, 80, 18, 32, 50
Measures of Central Tendency continued 3. Mode: the most frequent value in the sample. • useful when there is little variation in the data (values tend to be continuous and close to one another e.g. sales) • ex: sales data of ice cream in gallons over 8 weeks: 100, 99, 100, 102, 97, 110, 100, 103 • Aids in identifying the most common value for marketing purposes such as color or size of an item
Measures of Dispersion 1. Range: difference between the largest and the smallest sample observation value • Our firm’s highest profit this year was $20 million, and the lowest profit this year was $12 million ___________ ________________________________ • The larger the range, the more variation or dispersion. • Often used for “best case” and “worst case” scenario projections. • Limitation: only focuses on the extreme values and may not be really representative of the entire sample.
2. Variance and Standard Deviation: Variance (σ2 or s2): arithmetic mean of the squared deviation of each observation from the overall mean • How far observation values are from the average or how far they deviate from the average value; whether they are above or below doesn’t matter; squaring the deviations makes sure positive and negative deviations don’t cancel out each other. • Where x is the value in your sample; μ is the population average or mean so (x- μ) is how far your value deviates from the average; n is the number of observations. Standard Deviation (σ or s): is the square root of the variance • Often used as a measure of potential risk when there is uncertainty.
3. Coefficient of Variation (V): compares the standard deviation to the mean. • Used often by managers because the value is unaffected by the size or the unit of measure (such as thousands of dollars vs. millions of dollars). • For example: a manager is comparing two projects: one that costs thousands of dollars and one that costs millions of dollars and projecting profits for each. Looking at standard deviations and comparing them doesn’t allow you to compare apples to apples. Need a measure that isn’t affected by the measurement unit. Coefficient of Variation is such a measure. • V= σ/ μ or • Numerator is a measure of risk; denominator is a central tendency measure—average outcome. • Hence, in capital budgeting it is used to compare “risk-reward” ratios for different projects that differ widely in profitability or investment requirements.
Measure of Goodness of Fit • R2 or “coefficient of determination”: measures how much variation in the dependent variable is explained by our independent variables. • Higher numbers mean greater explanation and that deviations from the equation will be smaller • Coefficient of determination numbers are bounded between 0 and 1
Variable Significance • t-statistics and p-values are commonly used to measure significance (the influence of an independent variable on the dependent variable) • Excel which provides both. However, “p-values” are more commonly used so this is the measure we will use. • You define your research question: Is there a difference in blood pressures between those in group A (receiving a drug) and those in group B (receiving a sugar pill—no drug). • The null hypothesis is usually an hypothesis of "no difference" • For example: no difference between blood pressures in group A and group B. • You then test this hypothesis with data including blood pressures of member of group A and group B.
The “p- value” or sometimes called the “calculated probability” is the estimated probability of rejecting the null hypothesis (H0) of a study question when that hypothesis is true. • The probability of saying there is a difference in blood pressures (rejecting the null) when in fact there is not (there are no differences in blood pressure) • Standard practice in the field defines “statistically significant” if _______________ (smaller number such as 0.01 means greater significance)
Regression Analysis (OLS) Regression Analysis: uses data to describe how variables are related to one another. • In markets, many variables change simultaneously and regression analysis accounts for multiple changes Example: Q=f( P, Psub, ADV, m, POP, time) • Where Q=sales of Brand Name icecream (dependent variable) P=price of brand name ice cream Psub= price of a substitute, competing, brand ADV=adverstsing dollars m=Income POP=population t=time (sales quarter, to show trends or seasonality) • The right-hand side variables are called “independent variables” • Using data gathered on all variables, regression analysis allows us to see the relative importance of each independent variable (Price, income, etc) on the dependent variable, sales or quantity.
Excel: Summary Stats and Regression Analysis • Show in excel how to create summary statistics (mean, median, mode, range, etc) • Show in excel how to run the regression • Copy data into excel • Under Data Tab use “Data Analysis” • select regression from drop down list • select y range of data (dependent variable Q—select only data not title) • select x range of data (all independent variable data) • click OK • results pop into another window showing coefficients for our variables
REGRESSION OUTPUT • Regression equation (using coefficients above) Q=647071 -127436P +5.35ADV +29339Pcomp + 0.3403m +0.02POP + 4407t
Statistically significant variables • This means changes in price have a statistically significant impact on sales (same with competitors price and advertising) • Note each coefficient is ∆Q/∆variable • Example: if the firm increased price by $1.00 then estimated impact on sales is: ____________________________________ • If asked for a $0.50 change it would be: _______________________ • Income has no discernible effect in this model so predictions about changes in income would result in zero impact on quantity.