Survey Instrument Development in OB/HRM Research

Survey Instrument Development in OB/HRM Research Prof. Jiing-Lih Larry Farh HKUST IACMR Guangzhou workshop July 2007

Construct and Measurement Related Problems in Manuscripts • Too many constructs • Constructs are poorly defined • Measures do not match constructs • Unreliable/invalid measures • Level of measurement does not match the level of the theory Fatal flaws for empirical papers!!!

“The construction of the measuring devices is perhaps the most important segment of any study. Many well-conceived research studies have never seen the light of day because of flawed measures.” Schoenfeldt, 1984 “The point is not that adequate measurement is ‘nice’. It is necessary, crucial, etc. Without it we have nothing.” Korman, 1974, p. 194 “Validation is an unending process….Most psychological measures need to be constantly evaluated and reevaluated to see if they are behaving as they should.” Nunnally & Bernstein, 1994, p. 84

Independent Dependent (a) Conceptual X’ Y’ (b1) (b2) (c) Operational X Y (d) Empirical Research Model From Schwab (1999) • Independent and dependent variables are identified by X and Y, respectively. • The symbol prime, ’ is used to designate that a variable is specified at the conceptual level. • Arrows represent the direction of influence or cause. • a—conceptual relationship; d---empirical relationship; b1, b2—construct validity; c---internal validity

Validity in Research • Construct validity is present when there is a high correspondence between the scores obtained on a measure and the mental definition of a construct it is designed to represent. • Internal validity is present when variation in scores on a measure of an independent variable is responsible for variations in scores on a measure of a dependent variable. • External validity is present when generalizations of findings obtained in a research study, other than statistical generalization, are made appropriately.

Construct Validation • Involves procedures researchers use to develop measures and to make inferences about a measure’s construct validity • It is a continual process • No one method alone will give confidence in the construct validity of your measure

Define the construct and develop conceptual meaning for it Develop/choose a measure consistent with thedefinition Perform logical analyses and empirical tests to determine if observations obtained on the measure conform to the conceptual definition Construct Validation Steps From Schwab (1999) Content validity Factor analysis Reliability Criterion-related/ Convergent/ Discriminant/ Nomological validity

Survey Instrument Development Why is it important? How to do it? What are some of the best practices?

Instrumentation in Perspective • Selection and application of a technique that operationalizes the construct of interest • e.g., physics = colliders • e.g., MDs = MRI • e.g., OB = Job descriptive index • Instruments are devices with their own advantages and disadvantages, some more precise than others, and sophistication doesn’t guarantee validity

Survey Instruments • 3 most common types of instrumentation in social sciences • Observation • Interview • Survey instrumentation • Survey instrumentation • Most widely used across disciplines • Most abused technique---people designing instruments who have little training in the area

Why do we do surveys? • To describe the populations: What is going on? • Theoretical reasons: Why is it going on? • Develop and test theory • Theory should always guide survey development and data collection

What construct does this scale measure? (1) • Have a job which leaves you sufficient time for your personal or family life. (.86) • Have training opportunities (to improve your skills or learn new skills). (-.82) • Have good physical working conditions (good ventilation and lighting, adequate work space, etc.). (-.69) • Fully use your skills and abilities on the job. (-.63) • Have considerable freedom to adapt your own approach to the job. (.49) • Have challenging work to do---work from which you can get a personal sense of accomplishment. (.46) • Work with people who cooperate well with one another.(.20) • Have a good working relationship with your manager.(.20) Adapted from Heine et al. (2002)

What construct does this scale measure? (2) I would rather say “no” directly, than risk being misunderstood. (12) Speaking up during a class is not a problem for me. (14) Having a lively imagination is important to me. (12) I am comfortable with being singled out for praise or rewards. (13) I am the same person at home that I am at school. (13) Being able to take care of myself is a primary concern for me. (12) I act the same way no mater who I am with. (13) I prefer to be direct and forthright when dealing with people I have just met. (14) I enjoy being unique and different from others in many respects. (13) My personal identity, independent of others, is very important to me. (14) I value being in good health above everything. (8) Adapted from Heine et al. (2002)

Example: Computer satisfaction

Construct Definition • Personal computer satisfaction is an emotional response resulting from an evaluation of the speed, durability, and initial price, but not the appearance of a personal computer. This evaluation is expected to depend on variation in the actual characteristics of the computer (e.g., speed) and on the expectations a participant has about those characteristics. When characteristics meet or exceed expectations, the evaluation is expected to be positive (satisfaction). When characteristics do not come up to expectations, the evaluation is expected to be negative (dissatisfaction). From Schwab (1999)

Very Dissatisfied Neither Satisfied nor Dissatisfied Very Satisfied Satisfied Dissatisfied 1 2 3 4 5 Hypothetical Computer Satisfaction Questionnaire • Decide how satisfied or dissatisfied you are with each characteristic of your personal computer using the scale below. Circle the number that best describes your feelings for each statement. My satisfaction with:

Construct Variance Systematic Variance Observed Score Variance Deficiency Construct Valid Variance Reliable Contamination Unreliability Construct Validity Challenges From Schwab (1999)

Scale Development Process From Hinkin (1998) • Step1: Item Generation • Step 2: Questionnaire Administration • Step 3: Initial Item Reduction • Step 4: Confirmatory Factor Analysis • Step 5: Convergent/Discriminant Validity • Step 6: Replication

Step 1: Item Generation -Deductive Approach It requires: (a) an understanding of the phenomenon to be investigated; (b) thorough review of the literature to develop the theoretical definition of the construct under examination From Hinkin (1998)

Step 1: Item Generation-Deductive Approach • Advantages: through adequate construct definitions, items should capture the domain of interest, thus to assure content validity in the final scale • Disadvantages: requires the researchers to possess working knowledge of the phenomena; may not be appropriate for exploratory studies From Hinkin (1998)

Step 1: Item Generation - Inductive Approach • Appropriate when the conceptual basis may not result in easily identifiable dimensions for which items can then be generated • Frequently researchers develop scales inductively by asking a sample of respondents to provide descriptions of their feelings about their organizations or to describe some aspects of behavior • Responses classified into a number of categories by content analysis based on key words or themes or using a sorting process

Step 1: Item Generation - Inductive Approach • Advantages: effective in exploratory research • Disadvantages: • Without a definition of construct under examination, it is difficult to develop items that will be conceptually consistent. • Requires expertise on content analyses • Rely on factor analysis which does not guarantee items which load on the same factors share the same theoretical construct

Characteristics of Good Items • As simple and short as possible • Language should be familiar to target audience • Keep items consistent in terms of perspectives (e.g., assess behaviors vs. affective response) • Item should address one single issue (no double-barreled items) • Leading questions should be avoided • Negatively worded questions should be carefully constructed and placed in the survey

What about these items? • I would never drink and drive for fear of that I might be stopped by the police (yes or no) • I am always furious (yes or no) • I often lose my temper (never to always) • 滿招損，謙受益

Content Validity Assessment • Basically a judgment call • But can be supplemented statistically • Proportion of substantive agreement (Anderson & Gerbing, 1991) (see next slide) • Item re-translation (Schriesheim et al. 1990) • Content adequacy (Schriesheim et al. 1993)

Content Validation Ratio 2 n e CVR = - 1 N n is the number of Subject Matter Experts (SMEs) rating the selection tool or skills being assessed is essential to the job, i.e., having good coverage of the KSAs required for the job. e N is the total number of experts CVR = 1 when all judges believe the tool/item is essential; CVR = -1 when none of the judge believes the tool/skill is essential; CVR = 0 means only half of the judges believe that the tool/item is essential.

How many items per construct? • 4 - 6 items for most constructs. For initial item generation, twice as many items should be generated

Item Scaling • Scale used should generate sufficient variance among respondents for subsequent statistical analyses • Likert-type scales are the most frequently used in survey questionnaire. Likert developed the scale to be composed of five equal appearing intervals with a neutral midpoint • Coefficient alpha reliability with Likert scales has been shown to increase up to the use of five points, but then it levels off

Step 2: Questionnaire Administration • Sample size: Recommendations for item-to-response ratios range from 1:4 to 1:10 for each set of scales to be factor analyzed • e.g., if 30 items were retained to develop three measures, a sample size of 150 observations should be sufficient in exploratory factor analyses. For confirmatory factor analysis, a minimum sample size of 200 has been recommended.

Step 3: Initial Item Reduction • Interitem correlations of the variables to be conducted first. Corrected item-total correlations smaller than 0.4 can be eliminated • Exploratory factor analysis. An appropriate loadings greater than 0.40 and /or a loading twice as strong on an appropriate factor than on any other factor. Eigenvalues of greater than 1 and a scree test of the percentage of variance explained should also be examined • Be aware of construct deficiency problems in deleting items

Step 3: Internal Consistency Assessment • Reliability is the accuracy or precision of a measuring instrument and is a necessary condition for validity • Use Cronbach’s alpha to measure internal consistency. 0.70 should be served as minimum for newly developed measures.

n is the number of items for each applicant t is the total of all items for an applicant is the variance across all applicants 2  Coefficient alpha The average of all possible split halve reliabilities.

 2 2  2 t An example of coefficient  Variance of total = 7.0 ; Total of variance = 3.67

How High Cronbach Alpha Needs to be? • In exploratory research where hypothesized measures are developed for new constructs, the Alphas need to exceed .70 • In basic research where you use well-established instruments for constructs, the Alphas need to exceed .80. • In applied research where you need to make decisions based on the measurement outcomes, the Alphas need to exceed .90.

Step 4: Confirmatory Factor Analysis (CFA) • Items that load clearly in an exploratory factor analysis may demonstrate a lack of fit in a multiple-indicator measurement model due to lack of external consistency • It is recommended that a Confirmatory Factor Analysis be conducted using the item variance-covariance matrix computed from data collected from an independent sample. • Then assess the goodness of fit index, t-value, and chi square

Step 5: Convergent/Discriminant Validity • Convergentvalidity—when there is a high correspondence between scores from two or more different measures of the same construct. • Discriminat validity---when scores from measures of different constructs do not converge. • Multitrait-Multimethod Matrix (MTMM) • Nomological networks---relationships between a construct under measurement consideration and other constructs. • Criterion-related validity

Convergent Validity From Schwab (1999) Construct Measure A Measure B

Step 6: Replication • Find an independent sample to collect more data using the measure. • The replication should include confirmatory factor analysis, assessment of internal consistency, and convergent, discriminant, and criterion-related validity assessment

Elements of a MTMM matrix

A sample MTMM matrix (Paper & Pencil self test) Heterotrait-monomethod Monotrait-monomethod Monotrait-heteromethod Heterotrait-heteromethod Note: SE: self esteem; SD: self disclosure; LC: Locus of control Adapted from http://www.socialresearchmethods.net/kb/mtmmmat.htm

Interpreting MTMM • Reliability (monotrait-monomethod) should be the highest • Monotrait-heteromethod (convergent validity) must be >0 and high • Monotrait-heteromethod (convergent validity) > heterotrait-monomethod (discriminant validity)> heterotrait-heteromethod (i.e., convergent validity should be higher than discriminant validity)

Inductive Example: Taking Charge (Morrison & Phelps, 1999, AMJ) • Open-end survey to 148 MBA to list 152 individuals efforts, and collected 445 statements. • Reduce the list to 180 by eliminating redundant and ambiguous ones, and sort the statements into 19 groups based on similarity. • Write a general statement to reflect each group, compare the content of the statements with the construct, and result in 10 prototypical activities to reflect the construct. • Pretest it to 20 MBA students, to check for clarity and suggestion for wording improvements • Pretest the measure with a sample of 152 working MBAs, to assess internal consistency of the items and check whether the 10 specific behaviors were extra-role activities. 77% checked six or more.

Open-ended Survey: Taking Charge (Morrison & Phelps, 1999, AMJ) • To think of individuals with whom they had worked who have actively tried to bring about improvement within their organization. These change efforts could be aimed at any aspect of the org, including the person’s job, how work was performed within their dept, and org’al policies or procedures. • To focus on efforts that went beyond the person’s formal role or, efforts that were not required or formally expected. • To list specific behaviors that reflected or exemplified the person’s change effort.

Sample Items: Taking Charge(Morrison & Phelps, 1999, AMJ) • Try to institute new methods that are more effective • Try to introduce new structure, technologies, or approach to improve efficiency • Try to change how his/her job is executed in order to more effective • Try to bring about improved procedures for the work unit or department

Theoretical Model: Taking Charge(Morrison & Phelps, 1999, AMJ) Top management openness Group norms Taking charge Self-efficacy Felt responsibility Expert power

Deductive Example: Org. Justice (Colquitt, 2001, JAP) The Dimensionality of Organizational Justice OrganizationalJustice Procedural Justice Interactive Justice Informational Justice Distributive Justice

Sample Items: Org. Justice(Colquitt, 2001, JAP) • Distributive Justice • “Does your outcome reflect the effort you have input into your work” (Leventhal, 1976) • Procedural justice • “Have you been able to express your views and feelings during those procedures” (Thibaut & Walker, 1975) • Interactive justice • “Has he/she treated you in a polite manner” (Bies & Moag, 1986) • Informational justice • “Has he/she communicated details in timely manner” (Shapiro, et al., 1994)

Theoretical Model: Org. Justice(Colquitt, 2001, JAP) Outcome Satisfaction Distributive Justice Rule Compliance Procedural Justice Interactive Justice Leader Evaluation Informational Justice Collective Self-esteem

Research in Chinese context

Four Types of Scale Development Approaches in Chinese Management Research Farh, Cannella, & Lee (2006, MOR)

Survey Instrument Development in OB/HRM Research