610 likes | 803 Views
Data Analysis: Review and Practical Application using SPSS. Data of Interest. National Insurance Company 1000 questionnaires sent 285 respondents Questionnaire Presentation Copy given in class. Coding.
E N D
Data Analysis: • Review and Practical Application using SPSS
Data of Interest • National Insurance Company • 1000 questionnaires sent • 285 respondents • Questionnaire Presentation • Copy given in class
Coding • Coding broadly refers to the set of all tasks associated with transforming edited responses into a form that is ready for analysis • Steps • Transforming responses to each question into a set of meaningful categories • Assigning numerical codes to the categories • Creating a data set suitable for computer analysis
Transforming Responses into Meaningful Categories • A structured question is pre-categorized • Responses to a nonstructured or open-ended question to be grouped into a meaningful and manageable set of categories Q 1: In this questionnaire, how many non-categorized questions?
Missing-Value Category • A missing value can stem from • A respondent's refusal to answer a question • An interviewer's failure to ask a question or record an answer or a "don't know" that does not seem legitimate • Best way to treat missing value responses • Sound questionnaire design • Tight control over fieldwork
Assigning Numerical Codes • Assign appropriate numerical codes to responses that are not already in quantified form • To assign numerical codes, the researcher should facilitate computer manipulation and analysis of the responses
Multiple Response Question –Rank Order Question • Please rank the following Insurance companies by placing a 1 beside the company you think is best overall, a 2 beside the company you think is second best, and so on.__________Progressive__________All State__________National • Q2 How would you code the previous question to be added to the questionnaire ? This question requires as many variables (and columns) as there are objects to be ranked: 3 separate variables are needed
Creating a Data Set • Organized collection of data records • Each sample unit within the data set is called a Case or Observation • Structure of a Data Set • The number of observations = n • The total number of variables embedded in the questionnaire is m, then • Data set = n x m matrix of numbers • Importance of Coding Sheet: Anybody can enter /check data set. (Copy of coding sheet)
SPSS Data Set • 2 Views : Variable and Data. • Raw Variable (labels and values) • Transformed Variable (compute and recode)
Preliminary Data Analysis: Basic Descriptive Statistics • Preliminary data analysis examines the central tendency and the dispersion of the data on each variable in the data set • Measurement level dictates what to do • Feeling for the data • What can we do: limitations on next slide? Run descriptives. (outputs 1)
Measures of Central Tendency and Dispersion for Different Types of Variables
Why Averages May be Misleading • Researchers tested a new sauce product and found • Mean rating of the taste test was close to the middle of the scale, which had "very mild" and "very hot" as its bipolar adjectives • Researcher’s conclusion • Consumers need really neither really hot nor really mild sauce
Why Averages May be Misleading (Cont’d) • Deeper examination revealed • The existence of a large proportion of consumers who wanted the sauce to be mild and an equally large proportion who wanted it to be hot nor really mild sauce • Moral of the story: • A clear understanding of the distribution of responses can help a researcher avoid erroneous inferences. Talk about Skewness and Kurtosis.
Crosstabs: Occurencies in specific condition. • Most of the time with categorical variables • Examples to run
Cross-Tabulations- Comparing frequencies: Chi-square Contingency Test • Technique used for determining whether there is a statistically significant relationship between two categorical (nominal or ordinal) variables
Cross-Tabulation Using SPSS for National Insurance Company • One crucial issue in the customer survey of National Insurance Company was how a customer's education was associated with whether or not she or he would recommend National to a friend.
Need to Conduct Chi-square Test to Reach a Conclusion • The hypotheses are: • H0:There is no association between educational level and willingness to recommend National to a friend (the two variables are independent of each other). • Ha:There is some association between educational level and willingness to recommend National to a friend (the two variables are not independent of each other). • Let’s do it….
Conducting the Test • Test involves comparing the actual, or observed, cell frequencies in the cross-tabulation with a corresponding set of expected cell frequencies(Eij)
Expected Values ninj Eij = ----- n where niand njare the marginal frequencies, that is, the total number of sample units in category i of the row variable and category j of the column variable, respectively
Chi-square Test Statistic where r and c are the number of rows and columns, respectively, in the contingency table. The number of degrees of freedom associated with this chi‑square statistic are given by the product (r - 1)(c - 1).
Computed Chi-square value P-value National Insurance Company Study
National Insurance Company Study --P-Value Significance • The actual significance level (p-value) = 0.019 • the chances of getting a chi-square value as high as 10.007 when there is no relationship betweeneducation and recommendation are less than 19 in 1000. • The apparent relationship between education and recommendation revealed by the sample data is unlikely to have occurred because of chance. • We can safely reject null hypothesis.
Precautions in Interpreting Cross Tabulation Results • Two-way tables cannot show conclusive evidence of a causal relationship • Watch out for small cell sizes • Increases the risk of drawing erroneous inferences when more than two variables are involved
Overview of Techniques for Examining Associations • Spearman Correlation Coefficient Technique • The technique is appropriate when • The degree of association between two sets of ranks (pertaining to two variables) is to be examined • Illustrative Research Question(s) This Technique Can Answer: • Is there a significant relationship between motivation levels of salespeople and the quality of their performance? • Assume that the data on motivation and quality of performance are in the form of ranks, say, 1through 20, for 20 salespeople who were evaluated subjectively by their supervisor on each variable
Overview of Techniques for Examining Associations (Cont’d) • Pearson Correlation Coefficient Technique • This technique is appropriate when • The degree of association between two metric-scaled (interval or ratio) variables is to be examined • Illustrative Research Question(s) This Technique Can Answer: • Is there a significant relationship between customers' age (measured in actual years) and their perceptions of our company's image (measured on a scale of 1to 7)?
A Spearman correlation coefficient is a measure of association between two sets of ranks di = the difference between the ith sample unit's ranks on the two variables n = the total sample size Spearman Correlation Coefficient
The Pearson correlation coefficient is the degree of association between variables that are interval-or ratio-scaled. Pearson correlation coefficient (rxy) between them is given by n = sample size (total number of data points) X and Y = means Xi and Yi = values for any sample unit i sx and sy = standard deviations n S (Xi – X)(Yi – Y) = 1 i rxy = ----------------------------- (n-1) sx sy Pearson Correlation Coefficient
National Insurance Company– Computing Pearson Correlation Among Service Quality Constructs • National Insurance Companywas interested in the correlations between respondents’ overall service-quality perceptions (on the 10-point scale) and their average ratings along each of the five dimensions of Service Quality
National Insurance Company– Computing Pearson Correlation Among Service Quality Constructs Using SPSS
Interpreting Pearson Correlation Coefficients • Each of the five service-quality measures (reliability, empathy, tangibles, responsiveness, and assurance) is significantly related to the overall quality (OQ) at the .001 level of significance • Responsiveness has the strongest correlation (.8625) • Tangibles have the weakest correlation (.5038) • All the correlations are strong enough to be meaningful
Comparing Means • Mainly T-tests and ANOVAs • T-test on OQ and gender.
Independent T-tests • Independent Variable with 2 categories max. • Equality of variance (cf output) • 88% of chance that the difference of .04 is due to chance (random effect). Cannot reject the null hypothesis.
Analysis of Variance • ANOVA is appropriate in situations where the independent variable is set at certain specific levels (called treatments in an ANOVA context) and metric measurements of the dependent variable are obtained at each of those levels
24 Stores Chosen randomly for the study 8 Stores randomly chosen for each treatment Treatment 1 Store brand sold at the regular price Treatment 2 Store brand sold at 50¢ off the regular price Treatment 3 Store brand sold at 75¢ off the regular price monitor sales of the store brand for a week in each store Example
ANOVA –Grocery Store Hypothesis • Grocery Store Example • Ho1 = 2 = 3 • Ha At least one is different from one or more of the others • Hypotheses for K Treatment groups or samples • Ho1 = 2 = ………..k • Ha At least one is different from one or more of the others
There is less than a .001 probability of obtaining an F-value as high as 137.447 Exhibit 15.1 SPSS Computer Output forANOVA Analysis (Cont’d)
ANOVA • OQ recommendation and OQ, individual variable • OQ and EDUC (Graph)..and post hoc
Overview of Techniques for Examining Associations (Cont’d) • Simple Regression Analysis Technique • This technique is appropriate when • A mathematical function or equation linking two metric-scaled (interval or ratio) variables is to be constructed, under the assumption that values of one of the two variables is dependent on the values of the other
Overview of Techniques for Examining Associations–Simple Regression Analysis (Cont’d) • Illustrative Research Question(s) this Technique Can Answer: • Are sales (measured in dollars) significantly affected by advertising expenditures (measured in dollars)? • What proportion of the variation in sales is accounted for by variation in advertising expenditures? How sensitive are sales to changes in advertising expenditures?
Overview of Techniques for Examining Associations (Cont’d) • Multiple Regression Analysis Technique • This technique is appropriate when • Under the same conditions as simple regression analysis except that more than two variables are involved wherein one variable is assumed to be dependent on the others
Overview of Techniques for Examining Associations (Cont’d) • Illustrative Research Question(s) this Technique Can Answer: • Are sales significantly affected by advertising expenditures and price (where all three variables are measured in dollars)? • What proportion of the variation in sales is accounted for by advertising and price? How sensitive are sales to changes in advertising and price?
Simple Regression Analysis • Generates a mathematical relationship (called the regression equation) between one variable designated as the dependent variable (Y) and another designated as the independent variable (X)
Independent Variable Vs.Dependent Variable • Independent variable • Explanatory or predictor variable • Often presumed to be a cause of the other • Dependent variable • Criterion Variable • Influenced by the independent variable
Practical Applications of Regression Equations • The regression coefficient, or slope, can indicate how sensitive the dependent variable is to changes in the independent variable • The regression equation is a forecasting tool for predicting the value of the dependent variable for a given value of the independent variable
Precautions In Using Regression Analysis • Only capable of capturing linear associations between dependent and independent variables • A significant R2-value does not necessarily imply a cause-and-effect association between the independent and dependent variables • A regression equation may not yield a trustworthy prediction of the dependent variable when the value of the independent variable at which the prediction is desired is outside the range of values used in constructing the equation
Precautions In Using Regression Analysis (Cont’d) • A regression equation based on relatively few data points cannot be trusted • The ranges of data on the dependent and independent variables can affect the meaningfulness of a regression equation
Multiple Regression Analysis • Yi = a + b1X1i + b2X2i + … + bkXki • Yi is the predicted value of the dependent variable for some unit i; • X1i, X2i, …, Xki are values on the independent variables for unit i; • bl, b2, . . . , bk are the regression coefficients; • a is the Y-intercept representing the prediction for Y when all independent variables are set to zero
National Insurance Company– Multiple Regression Using SPSS • Jill and Tom were interested in conducting a multiple regression analysis wherein overall service quality perceptions is the dependent variable and the average ratings along the five dimensions are the indpendent variable