Doing Quantitative Research 26E02900, 6 ECTS Cr.

Doing Quantitative Research26E02900, 6 ECTS Cr. Olli-Pekka Kauppila Daria Volchek Lecture IV - May 21, 2014

Today’slecture AM session • Structural equation modeling PM session • Multilevel research design

Learning objectives – AM session Get familiar with the basic idea of structural equation modeling Identify special characteristics of SEM Develop understanding of the latent variable constructs Understand the process of SEM Develop skills to interpret the quality of models and research results obtained with the application of SEM

Part I: Structual equation modeling (SEM) overview

Researchquestion Organizational commitment Role ambiguity Emotional exhaustion Job satisfaction Intention to leave Role conflict Performance Babakus, E., Cravens, D. W., Johnston, M., & Moncrief, W. C. (1999). The Role of Emotional Exhaustion in Sales Force Attitude and Behavior Relationships. Journal of the Academy of Marketing Science, 27(1), 58-70. Whatare the antecedents and consequences of the emotionalexhaustion of the individualswhodo ”peoplework”?

How many regression analysesdoyouneed to estimatethismodel? Organizational commitment Role ambiguity Emotional exhaustion Job satisfaction Intention to leave Role conflict Performance Babakus, E., Cravens, D. W., Johnston, M., & Moncrief, W. C. (1999). The Role of Emotional Exhaustion in Sales Force Attitude and Behavior Relationships. Journal of the Academy of Marketing Science, 27(1), 58-70.

SEMcandoit with onemodel

Generalization of regression SEM is like regression:Y = B0 + B1X1 + B2X2 + e With assumptions intact, regression is beautiful, but . . . SEM, a generalization, helps us address some of regression’s limitations

What is SEM? amultivariatestatisticaltechnique whichcombines (confirmatory) factoranalysis and multipleregression modeling • Cansimultaneouslytestmeasures and structuralrelationships for the purpose of analyzinghypothesizedrelationships • Testsmodelsthatareconceptuallyderived, a priori • Testsif the theoryfits the data amonglatent (i.e. unobservedortheoretical) variablesmeasuredbymanifestvariables (i.e. observedorempiricalindicators)

What is SEM? SEMencompasses an entirefamily of modelsknownbymanynames, e.g. covariancestructureanalysis, latentvariableanalysis, confirmatoryfactoranalysis, LISRELanalysis • LInearStructiralRELationships • TechnicallyLISREL is a computerprogramdevelopedby Karl Jöreskog and Dan Sörbom to docovariancestructureanalysis • Alsoothersoftwaresareavailable for SEM: AMOS, Mplus…

What is SEM? SEMtypicallyconsists of twoparts (orsub-models): The measurementmodel • specifieshowlatentvariablesdependuponorareindicatedby the observedvariables • describesthe measurementproperties (reliabilities and validities) of the observedvariables The structuralequationmodel • specifiescausalrelationshipsamong the latentvariables • describes the causaleffects • assigns the explained and unexplainedvariance

WhySEM? Organizational commitment Managerial support Intention to leave Job satisfaction Environmental perceptions The keybenefits of SEMare: • Estimation of multiple and interrelateddependencerelationships via series of separate, butinterdependent, multiple regression equations SEMallows to accomodatecontinuous, dummy, and categoricalmeasures.

WhySEM? The keybenefits of SEMare: • Estimation of multiple and interrelateddependencerelationships via series of separate, butinterdependent, multiple regression equations • Ability to modelbothobserved and unobserved (latent) variables and account for measurementerror in the estimationprocess (parameterestimatescloser to populationvalues)

WhySEM? The keybenefits of SEMare: • Estimation of multiple and interrelateddependencerelationships via series of separate, butinterdependent, multiple regression equations • Ability to modelbothobserved and unobserved (latent) variables and account for measurementerror in the estimationprocess (parameterestimatescloser to populationvalues) • Ability to define a model to explain the entire set of relationshipssimultaneously

What is a LatentVariable? • A latentvariableis an unobservedconceptthatcanonlybeapproximatedbyorservableormeasurablevariables(i.e. happiness, satisfaction, emotionalexhaustion), oftencalledfactor. • The observedvariables, whicharegatheredfromrespondentsthroughvarious data collectionmethods, areknown as indicatorsormanifestvariables.

Latentvariables as presumedcause of itemvalues (reflectivemeasure) Observed variable Observed variable Latent variable Factor Manifestvariables/ indicators Observed variable The latentvariable is viewed as an underlyingconstructthatgivesrise to somethingthat is observed(i.e. an observedvariable).

Latentvariables as summary of the measurements (formativemeasure) Observed variable Observed variable Latent variable Factor Manifestvariables/ indicators Observed variable The latentvariable is viewed as a summary (weights of the relativeimportance) of the observedvariables. Changes in the indicatorscausechange in the latentvariable.

Classroomexercise I In yourresearchprojectyouareinterested in identifying the antecedents and consequences of firm international performance Based on yourextensiveliteraturereview, youhavefound out thatbothreflective and formativemeasurescouldbeused for firmperformance Thus, youdecide to operationalizeboth: • Comeup with measureitemsthatyouthinkwouldcapturefirm international performance (1) in a reflective and (2) in formative manner

Firm international performance • Formative: • Percent of foreignsales in totalsales; • N of countries a firmhasentered; • Percent of foreignclients a firmhas. Reflective: Generally speaking, we are satisfied with our success in international markets; We have achieved the turnover objectives we set for internationalization; We have achieved the market-share objectives we set for internationalization; Internationalization has had a positive effect on our company’s profitability; Internationalization has had a positive effect on our company’s image; Internationalization has had a positive effect on the development of our company’sexpertise; The investments we have made in internationalization have paid off.

Measurementerror No matter how concrete we think our variables are, they always contain some error when we try to measure them. Measurement error is that proportion of the variable which our measure is unable to capture for variousreasons (systematicorrandom). It is vital to consider the amount of error in our measurement, no matter how confident we are that we have’gotitright’. However, in all other multivariate techniques we assume there isno error in variables.

Measurementerror The impact of measurementerror: βyx = βs* ρx βyx – observed regression coefficient βs – truestructuralcoeffients ρx – reliability of the predictorvariable Unless the reliability is 100%, the observedcorrelation (and resulting regression coefficient) willalwaysunderstate the ”true” relationship.

SEMcorrection for measurementerror SEM ”accounts for” or ”corrects for” the amount of measurementerror in the variables (latentconstructs) and estimateswhat the relationshipwouldbeiftherewas no measurementerror. βs= βyx / ρx Dueto thiscorrection, SEM regression coefficientsaremoreaccurate (closer to populationvalue) and tend to belargerthancoefficientsobtained with multiple regression analysis.

Incorporatingerrors Delta – an errortermassociated with an estimated, measured x-variable x1 δ1 x2 δ2 Role ambiguity x3 δ3 x4 δ4 x5 δ5

Types of relationships in a SEMmodel δ1 x1 δ1 x1 Role ambiguity δ2 x2 Role ambiguity δ2 x2 δ3 x3 δ3 x3 Emotional exhaustion δ4 x4 δ4 x4 Role conflict y3 y2 y1 δ5 x5 Role conflict x5 δ5 δ6 x6 δ6 x6 ε1 ε2 ε3 (1) Correlation (2) Dependence

Types of variables in SEMmodel y6 y5 y4 Exogenousvariables Endogenousvariables x1 Organizational commitment Role ambiguity x2 x3 Intention to leave Emotional exhaustion Job satisfaction x4 y12 y11 y9 y8 y10 y3 y7 y2 y1 Role conflict x5 x6 Notexplainedbyanyotherconstruct in the model Determinedbyconstructswithin the model

Importance of theory SEMmodelshouldnotbedevelopedwithoutunderlyingtheory! SEManalysesshouldbedictatedfirst and foremostby a strongtheoreticalbase. Theoryimpliesconsequences, some of whicharetested vs. data. Refutinganyconsequencesrefutes the theory (i.e. SEM is primarily a confirmatorymethod)

Testingtheory-basedmodels Modelimplies a pattern in the covariancematrix Undermultiple regression assumptionsintact, wecancomparemodel-impliedcovariancematrix with empiricalcovariancematrix (i.e. the onebased on the collected data) If the difference in covariancematrices is nonsignificant, weconfirm the hypothesizedtheoreticalrelationships (χ2 test)

Theory-drivenmodelingstrategy Confirmatorymodelingstrategy – the researcherspecifies a single modelcomposed of a set of the relationships and usesSEM to assesshowwellitfits the data (iteitherworksoritdoesn’t) Competingmodelsstrategy – estimatedmodel is compared with alternativemodels (e.g. a test of competingtheories) • Equivalentmodels– modelshave the samenumber of parameters with differentrelationshipsbetweenthem, and the alternativemodel(s) fits at least as well as the proposedmodel Modeldevelopmentstrategy– althoughbasicframework is proposed, a purpose of modeling is to improvethisframeworkthroughSEMmodeifications

Causationevidence in SEM Covariation – SEMcandeterminesignificantcovariationbetween the cause and effectconstructs; Sequence – causation in the temporalsequancecouldbeprovidedthroughexperimentalorlongitudinalresearch design; Nonspuriouscovariation – the size and nature of the relationshipbetween the cause and the effectshouldnotbeaffectedbyincludingotherconstructs (variables) in the model (Support=>Job satisfaction and Work environment) Theoreticalsupport – compellingtheoreticalrationale to support a cause-and-effectrelationship

Input matrix: Covariancematrix SEMdiffers from other multivariate techniques in that it uses only the variance-covariance or correlation matrix as its input data. Individual observations can be input into the programs, but they are converted into one of these two types of matrices before estimation. The focus of SEM is on the pattern of relationships acrossrespondents.

Number of observedindicators How many indicators should be used per construct? The minimum number of indicators for a construct is one − but the use of only a single indicator requires the researcher to provide estimates of reliability A construct can be represented with two indicators, but three is the preferred minimum number of indicators − because using only two indicators increases the chances of reaching an infeasible solution There is no upper limit in terms of the number of indicators. − In practice, 5-7 indicators should represent most constructs

Part II: Confirmatory Factor Analysis (CFA)

What is CFA? Confirmatory factor analysis – tests the extend to which a researcher’sa-priori, theoreticalpattern of factorloadings on prespecifiedconstructsrepresents the actual data (i.e. confirmsorrejectsourpreconceivedtheory) The factorsareassignedbased on the researcher’spriortheoreticalknowledge (statisticaltechniquedoesnotassignvariables to factorslike in ExploratoryFactor Analysis) Eachmeasuredvariableloadsonly on onepre-definedfactor Cross-loadingsarenotassigned CFA providesinformationabout the validities and reliabilities of the observedindicators

Stages in CFA 1. Developing a theoretically based model 2. Constructing a path diagram of causal relationships 3. Converting the path diagram into a set of measurement model 4. Choosing the input matrix type 5. Assessing the identification 6. Evaluatinggoodness-of-fitcriteria 7. Interpreting and modifying the model (if theoretically justified)

Assumptions of pathdiagram First, all causal relationships are indicated. Theory is the basis for inclusion or omission of any relationship. It is just as important to justify why a causal relationshipdoes not exist between two constructs as it is to justify the existence of another relationship. Yet it is important to remember that the objective is to model the relationships among constructs with the smallest number of causal paths or correlations among constructs that can be theoretically justified (parsimonious).

Assumptions of pathdiagram Second, all causal relationships are assumed to be linear. Nonlinear relationships cannot be directly estimated in structural equation modeling, but modified structural models can approximate nonlinear relationships. Assumption of linearity of the relationships requires all other assumptions for multivariate analysis to hold true

Example of measurement (CFA) model Role ambiguity Role conflict Intention to leave Emotional exhaustion Job satisfaction y9 y8 y6 y5 y7 y3 y4 y2 y1 x3 x6 x2 x5 x1 x4 ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε9 ε8 δ5 δ1 δ2 δ3 δ4 δ6 • x1-x3correlate, butcorrelation is zeroifweidentify a common cause of x1-x3, i.e. ”Roleambiguity” • Indicatorsareunidimensional, errortermsshouldnotbecorrelated • First, wetest the measures (wecanfixit) • Second, wetest the theorybetween the constructs

Translating the picture into SIMPLIS Role ambiguity Role conflict Intention to leave Emotional exhaustion Job satisfaction y9 y8 y6 y5 y7 y3 y4 y2 y1 x3 x6 x2 x5 x1 x4 ε1 ε2 ε3 ε4 ε5 ε6 ε7 ε9 ε8 δ5 δ1 δ2 δ3 δ4 δ6 ROLCONF1 = 1*Rolconf ROLCONF2= Rolconf ROLCONF3 = Rolconf EMEXH1 = 1*Emexh EMEXH2= Emexh EMEXH3= Emexh JOBSAT1 = 1*Jobsat JOBSAT2= Jobsat JOBSAT3= Jobsat LEAV1 = 1*Leav LEAV2= Leav LEAV3= Leav e.g. Rolam = W1(ROLAM1) + W2(ROLAM2) + W3(ROLAM3) ROLAM1 = 1*Rolam ROLAM2 = Rolam ROLAM3 = Rolam

How does CFA look like? Errorterm Observed variable Factorloading Latent construct Correlation between latentconstructs

Evaluatinggoodness-of-fitcriteria(assessment of measurementmodel) In evaluating the measurement part of the model, focus on the relationships between the latent variables and theirindicators (i.e. manifestvariables). The evaluation of the measurement part of the model should precede the detailed evaluation of the structural partof the model. The aim is to determine the validity and reliability of the measures used to represent the constructs of interest. − validity reflects the extent to which an indicator actually measures what it is supposed to measure − reliability refers to the consistency of measurement (i.e. the extent to which an indicator is free of random error)

Evaluatinggoodness-of-fitcriteria:Loadings The next step is to examine the estimated loadings and to assess the statistical significance of each one. - critical t-values: when α=.01, criticalt-value=2.33 when α=.05, criticalt-value=1.645 when α=.10, criticalt-value=1.282 If statistical significance is not achieved, the researcher may wish to eliminate the indicator or attempt to transform it for better fit with the construct.

Evaluatinggoodness-of-fitcriteria:Loadings One problem with relying on unstandardizedloadingsand associatedt-values is thatitmaybedifficult to compare the validity of differentindicatorsmeasuring a particularconstruct. • Indicators of the sameconstructmaybemeasures in verydifferentscales, ifthis is the case, thendirectcomparisons of the magnitudes of the loadingsareclearlyinappropriate • It is recommendedthat the magnitudes of the standardizedloadingsarealsoinspected (completelystandardizedsolution)

Evaluatinggoodness-of-fitcriteria:R2 Reliability of the indicatorscanbeexaminedbylooking at the squaredmultiplecorrelations (R2) of the indicators • they show the proportion of variance in an indicatorthat is explainedbyitsunderlyinglatentvariable • the rest is due to measurementerror • highmultiplesquaredcorrelationsvaluedenoteshighreliability for the indicator

Evaluatinggoodness-of-fitcriteria: Compositereliability (CR) In addition to assessing the reliability of the individualindicators, it is possible to calculate a compositereliabilityvalue for eachlatentvariable • alsoknown as constructreliability • to dothisuseinformation on the indicatorloadings and errorvariancesfrom the CompletelyStandardizedSolution • with single itemmeasures, it is notpossible to empiricallyestimate the reliability (couldbefixed at 1.0=noerrororestimatedby the researcher)

Compositereliability (CR) Recommendedthresholdvalue is .60 The reliabilityfor the latentconstructmustbecomputedseparately for eachmultipleindicatorconstruct in the model. LISRELdoesnotcomputethemdirectlybutprovidesallnecessaryinformation.

Averagevarianceextracted (AVE) Anothermeasure of reliability is the varianceextractedmeasure Reflects the overallamount of variance in the indicatorsaccounted for by the latentconstruct Highervarianceextractedoccurwhen the indicatorsaretrulyrepresentative of the latentconstruct The AVEmeasure is a complementarymeasure to the CRvalue

Averagevarianceextracted (AVE) AVE is quitesimilar to the CRmeasurebutdiffers in that the standardizedloadingsaresquaredbeforesummingthem Guidelinesuggestthat the varianceextractedvalueshouldexceed .50 for a construct

Classroomexercise II Calculate the reliabilitymeasures (CR and AVE) for the twolatentvariablesbelow:

Classroomexercise II Calculate the reliabilitymeasures (CR and AVE) for the twolatentvariablesbelow: Latentvariable 1: • CR = 0.770602 => exceeds the common threshold of 0.60 • AVE = 0.403557 => lower than 0.50 Latentvariable 2: • CR = 0.744912 => exceeds the common threshold of 0.60 • AVE = 0.423463 => lower than 0.50

Evaluatinggoodness-of-fitcriteria Next step is to assess overall model fit with one or more goodness-of-fitmeasures Goodness-of-fit measures reflect correspondence of the actual or observed input (covariance or correlation) matrix with that predicted from the proposed model. There are three types of Goodness-of-fit measures: • Absolute fit measures • Incremental fit measures • Parsimonious fit measures

Doing Quantitative Research 26E02900, 6 ECTS Cr.