240 likes | 406 Views
Department of Business Administration. SPRING 200 9 - 10. by Ass oc . Prof. Sami Fethi. Quantitative and Q ualitative D ata Analysis. Quantitative data analysis. Examining differences Relationship between variables Explaining and predicting relationship between variables
E N D
Department of Business Administration SPRING 2009-10 by Assoc. Prof. Sami Fethi Quantitative and Qualitative Data Analysis
Quantitative data analysis • Examining differences • Relationship between variables • Explaining and predicting relationship between variables • Data reduction, structure and dimension • Additional methods • Characteristic of qualitative research • Qualitative data • Analytical procedure • Interpretation • Strategies for qualitative analysis • Quantify qualitative data • Validity in qualitative research
Examining differences • Hypotheses about one mean • In research we often have to make statements about the mean. When the population variance is unknown, the stadard error of the mean is also unknown. The standard error of the mean must be estimated from sample data. • e.g. SDX= SD‘/ where SDX= standard error of mean SD‘= estimated standard deviation N= sample size SD‘= N-1 is degrees of freedom • Example 1: For a supermarket chain to add a new product, at least 100 units must be sold per week. The new product is tested in ten randomly selected stores for a limited time. Apply a test such as one-tailed t test and answer the question that will the new product sell more than 100 unit per week? a) construct hypothesis b) calculate mean and standard deviation if they are not given. c) calculate standart error of mean d) find t- value
Examining differences a) H0: X<=100 H1: X>100 b) X and SD are given 109.4 and 14.90 respectively. c) SDX = 14.90/ =4.55 d) t= (X-µ)/SDX=(109.4-100)/4.55=2.07 Where t-table is 1.83 at 5% significant level. We reject the null • Hypotheses about two means • This is usually associated with such a question: Are the tastes in region A different from the tastes in region B? • e.g. Where X1= sample mean for the first sample X2= sample mean for the second sample
Examining differences = the standard eror of differences in means µ1 and µ2 are the unknown population means and the general estimate of: In assuming the two population variances to be equal, the common population variance can be generated by pooling the samples. When the variances are unknonw and the standard errors of means must be estimated, then the t represents an adequate test statistics, distributed with v= N1+ N2-2- degrees of freedom. • Example2: A manufacturer has developed a new product and wonders whether the label of the package should be red or blue. The new products with two different labels are tested in ten randomly selected stores. The means sales obtained for the red package are 403.0 and for the blue package 390.3. The standard error of estimate for the difference means is 8.15.
Examining differences a) construct hypothesis b) find t- value a) H0: (µ1- µ2 )=0 H1: (µ1- µ2 )≠0 or H0: (µ1- µ2 )<=0 H1: (µ1- µ2 )>0 b) =((403.0-390.3)-0)/8.15=1.56 V=10+10-2=18 degrees of freedom...5% and df 18 so critical value from the table is 2.101. This means that null hypothesis is accepted.. H0: (µ1- µ2 )=0. This means that the two unknown population means are assumed to be same.
Useful alternative tests • In problems involving one or two population means, t-methods are usually appropriate, but often non-parametric methods are good alternatives. • e.g. Non-parametric methods have advantage of requiring less in terms of assumptions and less powerful than t-methods (see siegel and Castella; 1998). • e.g. The main difference between them is that t-method associates with means while non-parametric methods are concerned with medians. • ANOVA- analysis of variance measures comparisons of more than two groups simultaneously. This method rests on comparing the ratio of systematic variance to unsystematic variance. • In ANOVA, the following is computed: • Total variation by comparing each observation with the grand mean. • The between-group variation by comparing the treatment means with the grand mean. • The within-group variation by comparing each score in the group with the group mean. • Recall-MANOVA-multivariate analysis of variance. This has more than one dependent variable compared to ANOVA:
Comparison of more than two group Example 3: In the following table, three advertising campaigns tested in 24 randomly selected cities comparable in size and demographics. The following output is an anova analysis results:
Example 3 a) construct hypothesis b) find F- value whether significant or not c) Comment on the F-values a) H0: G1= G2= G3 H1: G1≠ G2 ≠ G3 d.f= 24-1=23, between group 3-1=2 within group 23-2=21. b) Fcalculated=24.1/4.17=5.88 Fcritical=n-k,k-1=24-3,3-1=(21,2). From F-distribution, Fcritical is 3.47. c) Since 5.88 is greater than 3.47, we reject the null hypothesis, that is, the group means are equal and accept the alternative hypothesis that the advertising campaigns vary in effectiveness.
Relationship between variables • In research, we are often preoccupied with whether there is a relationship or two or more variables covary. • Correlation coefficient • Based on the Pearson criterion, it examines the strength of linear relationship between two variables, for example x and y. • Theoretically, the Correlation coefficient can take the values from -1 to 1. A correlation coefficient of 1 tells us that two variables perfectly covary positively whereas -1 shows that two variables perfectly inversely related. Close to 0 indicates that the variables are unrelated. • The formula of the Correlation coefficient as fololw: • Where X and Y represent the sample means of X and Y.
Relationship between variables • Correlation coefficient A Correlation coefficient shows covariation between two variables, and not that the variables are causally related. • The square of the Correlation coefficient is the coefficient of determination. • R2=Explained variation/Total variation • Example 4- partial correlation • Using the following table (Table 1) and calculate the relationship between advertisement recognition, appeal and sex. In other words, Is the relationship between advertisement recognition and appeal inluenced by controling for sex?
Example 4 • This is partial correlation and can be formulated as follow based on partial Correlation coefficient r123 as such ad.roc, appeal, sex • This shows that controlling for sex the observed relationship between ad.roc, and appeal positive and strengthened.
Explaining and predicting relationship between variables • Explaining and predicting relationship between variables are important tasks in business research. One of the most applied and useful approaches to examining relationships between variables is regression analysis. In regression analysis, we want to fit a model that best describes the data which is done in regression analysis by applying the method of least squares. More precisely, this is done by fitting a straight line that minimizes the squared vertical deviations from that line as shown in following figure. • Single Linear Regression • Y= a0+a1xi+ei • Where Y=the outcome variable, X=predictor variable, a1=slope of the straight line fitted to the data and a0=intercept of the line and ei=difference between the score predicted and the score actually obtained. This is called residual.
Single Linear Regression Explaining and Predicting Relationshipbetween Variables Figure 1 The linear model
Single Linear RegressionExample 5 • Assume that a car dealer collects data for six months on four variables; Tv advertising, printing advertising, competitors’ advertising and sales. Y is sales. The car dealer expects carsales to be positively correlated with TV-ads and Print-ads and negatively correlated with competitors’ ads. Table 2 Data matrix
Simple Mean Regression-output Example 5 • Assume that a car dealer collects data for six months on four variables; Tv advertising, printing advertising, competitors’ advertising and sales. Y is sales. The car dealer expects carsales to be positively correlated with TV-ads and Print-ads and negatively correlated with competitors’ ads.Based on the information below, comment on the estimated coefficinent and T-ratio as well as R2 on Tv-Ads. Table 3 Simple mean regression-output
Simple Mean Regression-output Example 5-Answer • The estimated constant term 0.7 shows that If the dealer does not use Tv-ads at all (Tv-ads=0), the estimated expected value of carsale is 0.7 unit that is 7 car. The estimated regression coefficient of sales on Tv-Ads is 0.9. This coefficient shows that if the variable Tv-ads is increased by 1 unit, the estimated expected value of carsales increases by 0.9 units, that is nine car. The result, R- square, R2 that is 85.3 percent shows that the sample determination of coefficient is equal to 0.853. Practically speaking, this means that the variation in the variable Tv-ads has explained 85.3 percent of the variations in the dependent variable carsales. Estimated t-value on Tv-ads is 4.81 which is greater than 2 (tabular value from t-distribution) or rule of thumb so it is signficant 5% and 1% levels. This means that we can reject the null hypothesis that is the corresponding population regression coefficient is equal to zore. The conclusion then is that Tv-ads and sales are significantly related to each other or Tv-ads has positive impact on sales.
Assumptions in Regression analysis • The expected value of the error term is zero • The variance for the error term for each X is constant. This term homoscedasticity. If the variance to e varies with X, this is termed heteroscedasticity. • The error for the observations are uncorrelated. • e should be normally distributed for each X. • The error term should not be correlated with x-corr(e, x)=0 • It is also a common assumption that the regression model should be linear in its parameters.
Correlation Coefficients-output Example 6 • Assume that a car dealer collects data for six months on four variables; Tv advertising, printing advertising, competitors’ advertising and sales. Y is sales. The car dealer expects carsales to be positively correlated with TV-ads and Print-ads and negatively correlated with competitors’ ads. Use the concept of correlation coefficient and explain the relationships between the variable under inspection based on the information given in table 4. Table 4 Correlation coefficients-output
Correlation Coefficients-output Example 6 -Answer • The relationship between carsales (dependent) and Tv advertising, printing advertising, competitors’ advertising (explanatory) are expected to be high. The relationship between the explanatory variables as such Tv advertising, printing advertising, competitors’ advertising are expected to be low. So high correlation coefficient between for example Tv advertising and printing advertising shows a high degree of multicollinearity. This influences the estimates results badly. To remedy this situation, the relevant variable can be dropped from the regression equation. For example between sales and Tv-ads is 0.92 which is highly reasonable score or between sales and Comp-ads is 0.155 which is very low score .
Multiple Regression • In multiple regression, at least two or more independent or explanatory variables are applied to explain/predict the dependent variable. The purpose is to make the model more realistic, control for other variables, and explain more of the variance in the dependent variable as well as reduce the residuals. The following is a typical example output for a multiple regression. Table 5 Multiple regression – output
Dummy Variables • In a multiple regression, dummy variable can be used in two ways. As a dependent variables where its values take 1 or 0 that is also called dichotomous. The other type can be used as independent variable which takes the value 0 or 1. The dummy variable used in an analysis when there does not exist as numerical values. For example, in the following table that is a nominal scaled variable that can not be ranked so to be applied in a regression analysis, the seasons need to be assigned numbers Table 6 Coding of dummy variable
Dummy variables Example 7 • In the following table, there three new variables A, B and C and indicates that the four seasons are different combinations of zeros and ones. Assume that the following regression model for sales of women’s clothing where the price (P) is also included, has been estimated: • Sale=1000 - 0.5P+100A - 20B - 50C • a) Calculate the sales in the summer by considering dummy variables as well (i.e. p=$200 ). • b) Calculate the sales in the autumn by considering dummy variables as well (i.e. p=$200 ). • c) Compare the sales in winter and spring by keeping the same price. Table 6 Coding of dummy variable
Dummy variables Example 7-Answer • In the following table, there three new variables A, B and C and indicates that the four seasons are different combinations of zeros and ones. Assume that the following regression model for sales of women’s clothing where the price (P) is also included, has been estimated: • Sale=1000 - 0.5P+100A - 20B - 50C • Calculate the sales in the summer by considering dummy variables as well (i.e. p=$200 ). • Sale=1000 - 0.5 (200)+100(1) – 20(0) – 50(0)=$1000 • b) Calculate the sales in the autumn by considering dummy variables as well (i.e. p=$200 ). • Sale=1000 - 0.5 (200)+100(0) – 20(1) – 50(0)= $880 • c) Compare the sales in winter and spring by keeping the same price. • Winter- Sale=1000 - 0.5 (200)+100(0) – 20(0) – 50(1)= $950 • spring- Sale=1000 - 0.5 (200)+100(0) – 20(0) – 50(0)= $900