290 likes | 586 Views
Regression & factor analyses. Where regression can go wrong. An example: A financial company wishes to ascertain what the drivers of satisfaction are for their service: They are: EXPERT= "experts" Q30A2 = "Take the time to understand who you are"
E N D
Where regression can go wrong • An example: • A financial company wishes to ascertain what the drivers of satisfaction are for their service: They are: EXPERT="experts" Q30A2 ="Take the time to understand who you are" Q30A3 ="Communicate clearly, in plain language" Q30A6 ="Go out of their way to tailor the best deal" Q30A7 ="Have the knowledge and authority to make" Q30A8 ="Have a positive, can-do approach" Q30A11 ="Understand your business and the market" Q30A12 ="Are proactive with ideas on how to get t" Q30A13 ="Are prompt and reliable in handling any" Q30A14 ="Treat you with respect and listen" Q30A15 ="Keep in regular contact to keep you updated" Q32A1 ="The competitiveness of their fees and rates" Q32A2 ="Offering a flexible range of lending/rep" Q32A3 ="How easy it is to take out a commercial" Q32A4 ="The features and benefits of their comments" Q32A5 ="Providing a full range of commercial product" Q32A6 ="Being fair and reasonable in their lending“ Q24 ="Q3a. AMP BANKING OVERALL RATING“ NB: this is the response • These were all on a 10 point scale
Example • Let’s clean this data: SAS CODE: Libname hold ‘’; data temp; set hold.model; array new {*} Q24 EXPERT Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13 Q30A14 Q30A15 Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6; do i=1 to 26; if new[i] in (11) then new[i]=.; end; drop i; run; proc standard data=temp replace out=temp; var Q24 Q33 Q34 EXPERT Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13 Q30A14 Q30A15 Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6; run; data hold.model; set temp; run; • The above code changes 11’s for . (missings in SAS) and replaces them with the mean value for each varaible
Let’s look at the data: STAFF - Experts in Commercial Finance Ma Cumulative Cumulative EXPERT Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 5 1.67 5 1.67 2 4 1.33 9 3.00 3 5 1.67 14 4.67 4 3 1.00 17 5.67 5 14 4.67 31 10.33 6 16 5.33 47 15.67 7 22 7.33 69 23.00 7.462890625 121 40.33 190 63.33 8 50 16.67 240 80.00 9 24 8.00 264 88.00 10 36 12.00 300 100.00 Take the time to understand who you are Cumulative Cumulative Q30A2 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 10 3.33 10 3.33 2 4 1.33 14 4.67 3 11 3.67 25 8.33 4 10 3.33 35 11.67 5 19 6.33 54 18.00 6 19 6.33 73 24.33 7 25 8.33 98 32.67 7.4111328125 52 17.33 150 50.00 8 48 16.00 198 66.00 9 41 13.67 239 79.67 10 61 20.33 300 100.00 Communicate clearly, in plain language Cumulative Cumulative Q30A3 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 3 1.00 3 1.00 2 5 1.67 8 2.67 3 3 1.00 11 3.67 4 6 2.00 17 5.67 5 11 3.67 28 9.33 6 12 4.00 40 13.33 7 34 11.33 74 24.67 7.98046875 33 11.00 107 35.67 8 81 27.00 188 62.67 9 48 16.00 236 78.67 10 64 21.33 300 100.00
Some more code proc reg data = hold.model; model Q24= expert Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13 Q30A14 Q30A15 Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6; run; proc corr data = hold.model; var Q24 expert Q30A2 Q30A3 Q30A6 Q30A7 Q30A8 Q30A11 Q30A12 Q30A13 Q30A14 Q30A15 Q32A1 Q32A2 Q32A3 Q32A4 Q32A5 Q32A6; run;
Regression output Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 1.99970 0.53770 3.72 0.0002 EXPERT STAFF - Experts in Commercial 1 0.05590 0.06486 0.86 0.3895 Finance Matters Q30A2 Take the time to understand 1 0.01870 0.07645 0.24 0.8069 who you are Q30A3 Communicate clearly, in plain 1 0.02263 0.07383 0.31 0.7595 language Q30A6 Go out of their way to tailor 1 0.01097 0.06114 0.18 0.8578 the best Q30A7 Have the knowledge and 1 0.11831 0.06004 1.97 0.0498 authority to make Q30A8 Have a positive, can-do 1 0.13498 0.08037 1.68 0.0942 approach to doing Q30A11 Understand your business and 1 -0.06802 0.07025 -0.97 0.3338 the market Q30A12 Are proactive with ideas on 1 0.02511 0.05764 0.44 0.6634 how to get Q30A13 Are prompt and reliable in 1 0.37204 0.06702 5.55 <.0001 handling any Q30A14 Treat you with respect and 1 -0.17003 0.08039 -2.12 0.0353 listen Q30A15 Keep in regular contact to 1 0.07978 0.04594 1.74 0.0835 keep you updated Q32A1 The competitiveness of their 1 0.00392 0.06439 0.06 0.9514 fees and rates Q32A2 Offering a flexible range of lending/rep 1 -0.05496 0.07295 -0.75 0.4519 Q32A3 How easy it is to take out a commercial 1 0.07025 0.06019 1.17 0.2442 Q32A4 The features and benefits of their comments1 -0.08790 0.08377 -1.05 0.2949 Q32A5 Providing a full range of commercial prod 1 0.07440 0.05614 1.33 0.1861 Q32A6 Being fair and reasonable in their lending 1 0.15004 0.06826 2.20 0.0288
Issues • Note that many of these coefficients are not significant • Even worse some are negatively related when we would expect, in the worst case, that they would be at least >=0 • Eg:Q30A14 Treat you with respect and 1 -0.17003 0.08039 -2.12 0.0353 listen • i.e.: this seems to imply that that not listening and treating people dis-respectfully would increase overall satisfaction !#&%$#%*& • So what is going on?
Some Correlation output Q30A7 0.61756 0.58737 0.61441 0.59967 0.64270 1.00000 0.71403 0.59881 0.60714 Have the knowledge and authority to make <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q30A8 0.60261 0.58008 0.76265 0.68892 0.70250 0.71403 1.00000 0.76378 0.70638 Have a positive, can-do approach to doin <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q30A11 0.52959 0.62118 0.81022 0.66246 0.64729 0.59881 0.76378 1.00000 0.71796 Understand your business and the market <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q30A12 0.53925 0.55677 0.73597 0.59714 0.66199 0.60714 0.70638 0.71796 1.00000 Are proactive with ideas on how to get t <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q30A13 0.64158 0.47558 0.63395 0.64574 0.54768 0.68501 0.68526 0.64092 0.59023 Are prompt and reliable in handling any <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q30A14 0.47386 0.51258 0.65066 0.69404 0.55816 0.57507 0.66858 0.60788 0.51475 Treat you with respect and listen to wha <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q30A15 0.50963 0.51407 0.67322 0.59555 0.55953 0.51578 0.54464 0.60346 0.64993 Keep in regular contact to keep you upda <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 Q32A1 0.31972 0.37541 0.40878 0.40499 0.46758 0.40688 0.32594 0.38509 0.37980 The competitiveness of their fees and ra <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 • It appears that the explanatory variables are very highly correlated with each other .
Where do we go from here? • Clearly we have data that is multi-collinear ( i.e.: variable are linearly related and hence one variable may explain others) • In this case, some relationships may be hidden as another variable has ‘hogged’ the relationship in terms of explanation • So how do we go about seeing if we can reduce the number of variables we look at without losing the finer detail? • The answer is …. • (PS: let’s leave this example for a while and return to it later)
Factor Analysis • Background • A Factor Analysis takes answers from many (maybe different) types of questions and summarises them with a smaller number of factors. It works by pulling out “common dimensions” from the input variables and grouping them together (e.g.. if Income and Education were input into a Factor Analysis they would probably come out on one factor resembling Socio-Economic Status). • The reasons for doing this are: • to gain greater control over final solutions • to equate the scale of variables that have been measured on different scales • that the output factors are independent or orthogonal to each other
Principal components vs Factor analysis • With Principal components we compute: • Y=G(x-m) where G is othogonal and G’S G=L and L is diagonal matrix of eignevalues of S, the covariance matrix of x • With factor analysis we compute: x= m + Lf + e where L is a matrix of factor loadings. Here S=LL’+Y PC reduced dimensionality by taking a linear combination of the x’s FA attempts to understand correlations between observable variables in terms of underlying factors, which are themselves not directly observable (latent) Essentially the code you obtain is PC with ‘fudge factors’ so that we can investigate underlying or latent (i.e. factors) patterns
Problems • Missing Values • To perform a Factor Analysis the variables must contain no missing values. To overcome this any missing variables need to be filled in with the mean, median or mode - depending on the type of data. If there are missing values, the entire observation will be omitted from the analysis. • Variable Correlation and Factor Interpretation • Since Factor Analysis works by grouping variables which are correlated, the correlations between the variables should be checked before performing the analysis. From the qualitative research certain variables are expected to be correlated. This needs to be true if we are to reproduce the qualitative model. If this is not the case, it can result in problems interpreting the factors from the analysis. We need factors that make sense to continue with Regression Analysis or Segmentation (much later). • Number Of Factors • The number of factors used depends upon the individual and the job. The key point to note is that the factors need to be interpretable to be useful in analysis. Interpretability can make the final decision on how many factors you have.
Example The following pages are an example of a Factor Analysis from a project done for the Auckland Regional Council regarding recycling in businesses. The questions used for the following Factor Analysis example are on the next page. • What the ARC wanted was a segmentation so they could target recycling programs at businesses which would be receptive to them. They also wanted to find out which media channels would be most effective for reaching the target market.
Q6 I am now going to read a series of statements which describe how an organisation might feel about buying recycled products. Please indicate how strongly you agree or disagree that each of the following statements applies to your company on a scale of 1 to 10, where 1 means you strongly disagree, and 10 means you strongly agree. ROTATE AND READ • My company wouldn’t use recycled products because they look cheap and nasty. • Recycled products seem to be of much lower quality than non-recycled products. • Using recycled products results in our equipment breaking down and needing more maintenance. • They would need to be a lot cheaper before we would consider buying them. • If there were no other problems with recycled products we would even pay a small premium to use them. • All recycled products cost more than non-recycled products. • It’s not worth the time and effort finding and changing suppliers just to get recycled products. • It would be too hard to make the system changes necessary to use recycled products. • The range of recycled products available is not wide enough to warrant using them. • It’s just too difficult to get enough people to change their routines and to use more recycled products. • We would use recycled products if someone in our company took the responsibility to push the initiative ahead. • Using recycled products doesn’t really fit with our image. • If quality, price and availability were the same, we would choose to buy recycled products over not recycled products whenever we could • Manufacturing recycled products is actually less energy efficient and more harmful to the environment. • There are benefits to us if our customers see us as “Green”.
Preliminaries Prior to performing a Factor Analysis a couple of preliminaries need to be worked through. First of all, the data used for the Factor Analysis needs to be cleaned (i.e.. missing values or don’t knows replaced, influential points/ outliers checked and null microtab values that result in zeros). Next the correlations between the variables should be checked to see whether they are as the qualitative researcher (for segmentations) or client (for threshold analyses) expects. Checking Data First the variables in the Factor Analysis need to be checked for missing or invalid points. This can be done using a frequency table with code: proc freq data=hold.cards; table q33a1-q33a15; run; This table will show all values for the listed questions and how many missing values there are. The output for one table is shown below. The SAS System 10:40 Tuesday, February 25, 1997 11 Cumulative Cumulative Q33A13 Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 1 12 4.9 12 4.9 2 8 3.2 20 8.1 3 16 6.5 36 14.6 4 8 3.2 44 17.8 5 26 10.5 70 28.3 6 23 9.3 93 37.7 7 31 12.6 124 50.2 8 47 19.0 171 69.2 9 13 5.3 184 74.5 10 60 24.3 244 98.8 11 3 1.2 247 100.0
Data cleaning issues Replacing Don’t Knows Or Refused's With Missing's Say the questions have a 1-10 scale for answers with 11’s as don’t knows. To convert the don’t knows to missings the following code can be used: data hold.cards; set hold.cards; /* setting up an array for the variables to be replaced */ array new {*} q33a1-q33a15; /* running through that array */ do i=1 to dim(new); /* replacing 11’s with missings for all variables in the array */ if new[i]=11 then new[i]=.; end; /* dropping unneeded variable i */ drop i; run; Replacing Missings With Means Now the variables do not have any don’t know answers - but a heap of missing values. To replace all the missings with means the following code can be used: proc standard data=hold.cards replace out=hold.cards; var q33a1-q33a15; run;
Data cleaning issues… Replacing Missings With Other Values However if you want to replace missings with other values either of the following two sets of code can be used: To replace all variables with the same value: data hold.cards; set hold.cards; array new {*} q33a1-q33a15; do i=1 to dim(new); if new[i]=. then new[i]=8; end; drop i; run; To replace all variables with different values: data hold.cards; set hold.cards; if q33a1=. then q33a1=8; if q33a2=. then q33a2=8.25; if q33a3=. then q33a3=8.5; ... run;
Inspecting the data Checking Variable Correlations To check correlations between variables the following code can be used: proc corr data=hold.cards best=7; var q33a1-q33a15; run; The output from this procedure is shown over the next 2 pages. The best= option shows the 7 most highly correlated variables with each variable in the procedure. If the correlations between variables are not as they should be you can either: 1.leave the offending variable out of the Factor Analysis or 2.run separate Factor Analyses for different sets of variables (renaming the different sets of factors in between) The SAS System 10:40 Tuesday, February 25, 1997 12 Correlation Analysis 15 'VAR' Variables: Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15 Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Q33A1 247 2.534694 2.031215 626.069388 1.000000 10.000000 Q33A2 247 3.906780 2.452764 964.974576 1.000000 10.000000 Q33A3 247 2.845000 2.042431 702.715000 1.000000 10.000000 Q33A4 247 4.289362 2.528153 1059.472340 1.000000 10.000000 Q33A5 247 4.608333 2.376360 1138.258333 1.000000 10.000000 Q33A6 247 3.889952 2.238311 960.818182 1.000000 10.000000 Q33A7 247 4.504098 2.584346 1112.512295 1.000000 10.000000 Q33A8 247 3.144068 2.055235 776.584746 1.000000 10.000000 Q33A9 247 4.276316 2.383541 1056.250000 1.000000 10.000000 Q33A10 247 3.698347 2.384724 913.491736 1.000000 10.000000 Q33A11 247 5.782427 2.800034 1428.259414 1.000000 10.000000 ...
Inspecting the data The SAS System 10:40 Tuesday, February 25, 1997 20 Correlation Analysis Pearson Correlation Coefficients / Prob > |R| under Ho: Rho=0 / N = 247 Q33A1 Q33A1 Q33A2 Q33A12 Q33A3 Q33A9 Q33A7 Q33A4 1.00000 0.41681 0.41245 0.38841 0.31230 0.30666 0.29660 0.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A2 Q33A2 Q33A3 Q33A1 Q33A15 Q33A7 Q33A9 Q33A4 1.00000 0.48408 0.41681 0.36507 0.35198 0.33727 0.32125 0.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A3 Q33A3 Q33A2 Q33A1 Q33A4 Q33A10 Q33A15 Q33A8 1.00000 0.48408 0.38841 0.35555 0.34687 0.30598 0.28709 0.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A4 Q33A4 Q33A6 Q33A3 Q33A7 Q33A2 Q33A1 Q33A8 1.00000 0.42521 0.35555 0.32410 0.32125 0.29660 0.26938 0.0 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 Q33A5 Q33A5 Q33A13 Q33A14 Q33A11 Q33A12 Q33A1 Q33A7 1.00000 0.20620 0.17659 0.11134 -0.09332 0.08699 -0.08606 0.0 0.0011 0.0054 0.0807 0.1436 0.1729 0.1776 ...
SAS Code • SAS Code • The code for performing a factor analysis is as follows: proc factor data=hold.cards nfact=6 rotate=varimax out=hold.cards fuzz = .3; var q33a1-q33a15; run; • data= input data set • nfact= number of factors asked for • Out= output data set with factor values for each individual • var variables in the Factor Analysis • fuzz = .3 , eliminates any value less than .3 in absolute value in the FA output (see below)
SAS output SAS System 12:05 Monday, February 24, 1997 1 Initial Factor Method: Principal Components Prior Communality Estimates: ONE 1. Eigenvalues of the Correlation Matrix: Total = 15 Average = 1 1 2 3 4 5 6 7 8 Eigenvalue 3.9958 1.5398 1.1739 1.0474 0.9795 0.8416 0.8112 0.7429 Difference 2.4560 0.3659 0.1265 0.0679 0.1380 0.0303 0.0683 0.0584 Proportion 0.2664 0.1027 0.0783 0.0698 0.0653 0.0561 0.0541 0.0495 Cumulative 0.2664 0.3690 0.4473 0.5171 0.5824 0.6385 0.6926 0.7421 9 10 11 12 13 14 15 Eigenvalue 0.6845 0.6624 0.5972 0.5746 0.5072 0.4646 0.3774 Difference 0.0221 0.0652 0.0226 0.0673 0.0426 0.0872 Proportion 0.0456 0.0442 0.0398 0.0383 0.0338 0.0310 0.0252 Cumulative 0.7878 0.8319 0.8717 0.9100 0.9439 0.9748 1.0000 6 factors will be retained by the NFACTOR criterion. 2. Factor Pattern FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 Q33A1 0.64261 0.10466 0.22909 0.00702 0.37260 0.19244 Q33A2 0.70770 0.06589 0.02988 -0.15700 0.28081 -0.19919 Q33A3 0.64134 0.14019 -0.13186 -0.04640 0.25951 -0.24317 Q33A4 0.57238 0.35839 -0.29582 0.10771 -0.02757 0.27656 Q33A5 -0.08357 0.53286 0.53208 -0.09429 0.15764 -0.23052 Q33A6 0.43021 0.47717 -0.41209 -0.13827 -0.28342 0.05263 Q33A7 0.60480 0.04908 -0.10888 0.33017 -0.19909 0.04189 Q33A8 0.60941 -0.16610 0.22173 0.31312 -0.23898 -0.00199 Q33A9 0.55460 0.10588 0.28130 -0.21724 -0.42052 -0.08247 Q33A10 0.58595 0.04968 0.18529 0.21746 -0.27972 -0.17652 Q33A11 -0.23996 0.50645 -0.23555 0.54390 0.32545 0.01683 Q33A12 0.51615 -0.32144 0.32763 0.05628 0.22886 0.55261 Q33A13 -0.24610 0.49055 0.39569 0.02863 0.00242 -0.02809 Q33A14 -0.37558 0.46974 0.09508 -0.27936 -0.20472 0.47604 Q33A15 0.48719 -0.00283 -0.24449 -0.54913 0.19115 -0.02302 Variance explained by each factor FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 3.995833 1.539818 1.173885 1.047402 0.979544 0.841555
SAS output… 3. Final Communality Estimates: Total = 9.578037 Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 0.652294 0.649247 0.576990 0.632425 0.660909 0.684801 0.530450 0.603293 Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15 0.628753 0.536827 0.771585 0.838002 0.459394 0.717317 0.635753 The SAS System 12:05 Monday, February 24, 1997 2 Rotation Method: Varimax 4. Orthogonal Transformation Matrix 1 2 3 4 5 6 1 0.60090 0.60049 0.31986 -0.14237 0.35172 -0.17902 2 -0.04409 0.07833 0.60942 0.70061 -0.16650 0.31930 3 0.25547 -0.14998 -0.47831 0.67364 0.37609 -0.29702 4 0.54786 -0.28654 -0.18312 -0.10344 0.06205 0.75476 5 -0.46699 0.57038 -0.32109 0.06755 0.37548 0.45600 6 -0.23126 -0.45094 0.40111 -0.14078 0.74986 0.01305
SAS output… • Rotated Factor Pattern Q33A1 . 0.48345 . . 0.57939 . Q33A2 . 0.72062 . . . . Q33A3 . 0.68685 . . . . Q33A4 . . 0.64305 . . . Q33A5 . . . 0.79651 . . Q33A6 . . 0.76294 . . . Q33A7 0.59762 . . . . . Q33A8 0.71377 . . . . . Q33A9 0.49688 . . . . -0.50583 Q33A10 0.68783 . . . . . Q33A11 . . . . . 0.83377 Q33A12 . . . . 0.86208 . Q33A13 . . . 0.64644 . . Q33A14 -0.38964 -0.45439 0.42849 0.39468 . . Q33A15 . 0.60575 0.301 . . -0.3431 Variance explained by each factor FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 2.095412 2.052493 1.520768 1.401878 1.318371 1.189115 Final Communality Estimates: Total = 9.578037 Q33A1 Q33A2 Q33A3 Q33A4 Q33A5 Q33A6 Q33A7 Q33A8 0.652294 0.649247 0.576990 0.632425 0.660909 0.684801 0.530450 0.603293 Q33A9 Q33A10 Q33A11 Q33A12 Q33A13 Q33A14 Q33A15 0.628753 0.536827 0.771585 0.838002 0.459394 0.717317 0.635753 Scoring Coefficients Estimated by Regression Squared Multiple Correlations of the Variables with each Factor FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
SAS output… 6. Standardized Scoring Coefficients FACTOR1 FACTOR2 FACTOR3 FACTOR4 FACTOR5 FACTOR6 Q33A1 -0.08335 0.18455 -0.03213 0.14900 0.43336 0.11645 Q33A2 -0.05022 0.41908 -0.08899 0.09010 -0.01441 -0.01110 Q33A3 -0.01743 0.41447 -0.03231 0.02842 -0.12090 0.11730 Q33A4 0.00491 -0.05167 0.43022 -0.08589 0.15910 0.19259 Q33A5 0.02684 0.18768 -0.15766 0.60951 -0.04507 -0.01852 Q33A6 0.00968 -0.01383 0.53336 -0.04939 -0.21569 -0.04682 Q33A7 0.32195 -0.12140 0.13970 -0.11504 -0.00638 0.15652 Q33A8 0.42291 -0.16895 -0.08466 -0.01712 0.06781 -0.00350 Q33A9 0.25110 -0.08846 0.10820 0.19609 -0.12006 -0.42766 Q33A10 0.42262 -0.06087 -0.03939 0.09682 -0.14606 -0.03909 Q33A11 0.02286 0.05149 0.08346 0.06972 0.02062 0.71907 Q33A12 -0.07340 -0.15889 -0.04086 -0.05886 0.76862 -0.01701 Q33A13 0.05660 -0.05396 -0.00596 0.46108 0.02966 0.03395 Q33A14 -0.22857 -0.34256 0.45994 0.21551 0.27576 -0.19905 Q33A15 -0.35190 0.37817 0.15988 -0.08769 -0.01491 -0.26763
SAS Output - interpretation • The above lines of code result in the output on the last few pages. The output shows: 1. the eigenvalues for each factor (check for reasonable size). The cumulative row shows what percentage of the variance is explained in the Factor Analysis using different numbers of factors. Aim for approximately 60% or more ultimately depending on the interpretability of the Factor Analysis. 2.the unrotated factor pattern (ignore this). 3.final communality estimates (check for any low ones). These show how much of each variables variance is explained by the factors. It is desirable for these to be approximately 60% or better for those variables which are important in the final analysis. Any variable with a low communality is essentially NOT used in the factor solution. If an important variable has a low communality, it can be used in a segmentation as a separate variable (more later).
SAS Output - interpretation… • The above lines of code result in the output on the last few pages. The output shows: 4. the orthogonal transformation matrix (ignore this). 5. the rotated factor pattern (the key output - examine this closely). This shows each variables weighting on each factor. The important variables for each factor are those with weightings of around 50% and over. 6. the standardised scoring coefficients (use this in FA regression).
Output - meaning Q33A1 . 0.48 . . 0.58 . My company wouldn’t use recycled products because they look Q33A2 . 0.72 . . . . Recycled products seem to be of much lower quality than non- Q33A3 . 0.69 . . . . Using recycled products results in our equipment breaking down a Q33A4 . . 0.64 . . . They would need to be a lot cheaper before we would consider Q33A5 . . . 0.80 . . If there were no other problems with recycled products we would Q33A6 . . 0.76 . . . All recycled products cost more than non-recycled products. Q33A7 0.60 . . . . . It’s not worth the time and effort finding and changing suppliers just Q33A8 0.71 . . . . . It would be too hard to make the system changes necessary to use Q33A9 0.50 . . . . -0.51 The range of recycled products available is not wide enough to Q33A10 0.69 . . . . . It’s just too difficult to get enough people to change their routines Q33A11 . . . . . 0.83 We would use recycled products if someone in our company took the Q33A12 . . . . 0.86 . Using recycled products doesn’t really fit with our image. Q33A13 . . . 0.65 . . If quality, price and availability were the same, we would choose to Q33A14 -0.39 -0.45 0.43 0.39 . . Manufacturing recycled products is actually less energy efficient and Q33A15 . 0.61 0.30 . . -0.34 There are benefits to us if our customers see us as “Green”.
Interpretation • So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows: • Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products.” • Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment.” • Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products.” • Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could.” • Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image.” • Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead.” • So this Factor Analysis explains 64% of the overall variance (from 1. above). The majority of the variables have over 60% of their variance explained (from 3. above). The final factors (from 5. above) are as follows: • Factor 1: Hassle factor - a combination of the performance ratings “It’s not worth the time and effort finding and changing suppliers just to get recycled products”, “It would be too hard to make the system changes necessary to use recycled products”, “The range of recycled products available is not wide enough to warrant using them” and “It’s just too difficult to get people to change their routines and to use more recycled products.” • Factor 2: Quality Factor - a combination of the performance ratings “Recycled products seem to be of much lower quality than non-recycled products”, “Using recycled products results in our equipment breaking down and needing more maintenance”, “There are benefits to us if our customers see us as ‘Green’” and negative weighting on “Manufacturing recycled products is actually less energy efficient and more harmful to the environment.” • Factor 3: Price Factor - a combination of the performance ratings “They would need to be a lot cheaper before we would consider buying them” and “All recycled products cost more than non-recycled products.” • Factor 4: Would Use Factor - a combination of the performance ratings “If there were no other problems with recycled products we would even pay a small premium to use them” and “If quality, price and availability were the same, we would choose to buy recycled products over non-recycled products whenever we could.” • Factor 5: Image Factor - a combination of the performance ratings “My company wouldn’t use recycled products because they look cheap and nasty” and “Using recycled products doesn’t really fit with our image.” • Factor 6: Help Factor - the performance rating “We would use recycled products if someone in our company took the responsibility to push the initiative ahead.”