280 likes | 493 Views
Purpose of Training Evaluation. Used to make decisions about selection , adoption , value and modification of training program. Based on instructional objectives derived from needs assessment . Blocks : Lack of trained personnel Resistance to evaluation
E N D
Purpose of Training Evaluation • Used to make decisions about selection, adoption, value and modification of training program. • Based on instructional objectives derived from needs assessment. • Blocks: • Lack of trained personnel • Resistance to evaluation • Outcomes of negative evaluations? • Evaluation not expected or rewarded. • Unrealistic criteria (ie., turnover used as criterion for measuring effectiveness of communication training.)
OVERVIEW • Day 1: Criterion Development • Day 2: Evaluation Designs
CRITERION DEVELOPMENT • CRITERIA are: • Standards by which training is evaluated. • Measures of training effectiveness. • Based on needs assessment and learning objectives. • Example: • PowerPoint training learning objectives: “Trainees will demonstrate proficiency in use of software.” • What does that mean? What criteria or outcomes will you use to measure “proficiency” ?
Evaluation of Criteria • CRITERION RELEVANCY • Right on target: KSA’s in training & evaluation are KSA’s needed for successful job performance. • CRITERION DEFICIENCY • Did you miss the boat? Did you leave out important job KSA’s in your training and/or training evaluation? • CRITERION CONTAMINATION • Did you miss the ocean? Are you evaluating training based on: • KSA’s that were not covered in training? • KSA’s that are not needed in job? • Needs Assessment critical for avoiding these problems.
LEVELS OF CRITERIA(Kirkpatrick, 1960) • REACTION • ADVANTAGES: • Widely used and accepted; cheap and easy. • Early reaction measures allow for mid-training changes. • Good feedback on way course is taught & trainee motivation. • DISADVANTAGES: • Little relation to learning or outcomes. • Feel-good sheets. • USE: • Anonymous answers. • Open-ended comment space. • Survey right after training and few months later (what should have been done differently?) • Analyze by group/department (which department had most positive reaction to training?)
LEARNING • ADVANTAGES: • More direct assessment of accomplishment of learning objectives. • More valid than reaction (self-report.) • Objective and quantifiable. • Taps learning rather than reaction to trainer. • DISADVANTAGES: • Time and cost. • USE: • Need pre and post test to assess whether change has occurred. • Change due to training? Need control group. • Format and scoring considerations: • Make sure questions are comparable. • Essay vs. multiple choice.
BEHAVIOR • ADVANTAGES: • Measures transfer of training to job setting. • Stronger case for effectiveness of training. • Good feedback for future needs assessments and re-design. • DISADVANTAGES: • Need to know what constitutes successful job performance; dependent on good task analysis in needs assessment. • Can’t control when or whether trainees will have chance to use new skills (opportunity bias). • USE: • Involve trainees, supervisors, peers & subordinates in both pre and post-training performance appraisals. • Before/after measures of job performance. • Collect post measures 3 months or more after training (lag effect.) • Compare to control group that did not receive training.
RESULTS • ADVANTAGES: • Cost-benefit analysis. Relation of training to costs, turnover, production, bottom-line.) • Provides information for utility analyses: • Does training decrease costs associated with poor selection? • How does formal training compare with on-the-job training? • DISADVANTAGES: • Outcomes are multi-determined - difficult to show relation between training and outcomes. • Positive relationship: Is it really due to training? • No relationship: Was it too much to expect? • Negative relationship: ??? • USE: • With criteria from other levels.
OTHER CRITERIA CLASSIFICATIONS • OUTCOME(Behavior/Results/Learning) vs. PROCESS(Reaction). • OBJECTIVE vs. SUBJECTIVE. • FORMATIVE(evaluate training process/reaction) vs. SUMMATIVE(trainee change/learning, behavior, results). • TIME • Immediate (taken during training: mid-term evaluations.) • Proximal (advanced training or shortly after training is over.) • Distal (taken a considerable time after training: transfer.) • NORM VS. CRITERION-REFERENCED • Norm: graded on curve. • Criterion: absolute threshold needed.
GUIDELINES FOR CRITERION DEVELOPMENT • Use MULTIPLE CRITERIA(reaction, learning, behavior & results.) • Different levels give different information. • Agreement/disagreement among levels. • Criteria derived from LEARNING OBJECTIVES & NEEDS ASSESSMENT. • Ensure CRITERION RELEVANCY & RELIABILITY. • Use CRITERION-BASED measures for critical outcomes (ie., how to fly a plane; drive a car.) • Use both LONG AND SHORT term measures.
EVALUATION(DAY 2) • OVERVIEW • Internal and External Validity • Threats to Validity • Research Designs
EVALUATION QUESTIONS • Based on criteria, has change occurred? • Is the change due to training?(internal validity) • Will the change occur for new trainees in the same organization?(external validity; intra-organizational validity) • Will the change occur for new trainees in other organizations?(external validity; inter-organizational validity.)
INTERNAL VALIDITY • Ability to say “A causes B.” • A = independent variable; predictor; training. B = dependent variable; criterion; performance. • Causality vs. Correlation. • Need to show 2 things: • 1. Change has occurred. • 2. Rule out alternative explanations for change: • Control alternatives (control group.) • Equivalency of 2 groups (random assignment.)
Examples • X P: Has change occurred ? • P1 X P2: Change has occurred, due to training? • P1 X P2 Add control group; change due to P1 P2: treatment or group differences? • R P1 X P2 Random assignment to 2 groups; R P1 P2: Rule out group differences; addresses external validity.
THREATS TO INTERNAL VALIDITY • HISTORY: • Events occurred between pre and posttest that affects posttest scores. • Controlled by control group. • MATURATION: • Participant changes (older, fatigued, more or less interested in training) between pre and posttests. • Control group and randomization. • INSTRUMENTATION: • Changes in grading standards, rater or instrument. • Control group.
Threats (cont’d) • TESTING: • Pretest sensitization: pretest affects posttest. • Threat to internal and external validity. • Control by using Solomon 4-group design: • 1. R P1 X P2 • 2. R P1 P2 • 3. R X P • 4. R P • Pretest has no effect: 1 = 3; 2 = 4.
Threats (cont’d) • STATISTICAL REGRESSION: • Statistical artifact occurs when trainees selected on basis of extreme scores. Error variance in instrument results in scores regressing towards mean on posttest. • Control with random assignment to 2 groups. • SELECTION: • Differences in characteristics between 2 groups (ie, women more likely than men to take training; effectiveness due to gender or gender X treatment interaction rather than training.) • Control with randomization. • SELECTION X MATURATION INTERACTION: • People who volunteer for interpersonal communication training have different levels of maturation: ready for training. • Control groups and randomization.
Threats (cont’d) • MORTALITY: • Differential loss in training and control groups (due to underlying traits, demographics, needs.) • Control with random assignment: groups equally likely to have traits, demographics to start. • DIFFUSION - IMITATION OF TREATMENT: • Treatment revealed to control group; performance increases. • COMPENSATORY EQUALIZATION OF TREATMENT: Others help control group. • COMPENSATORY RIVALRY: • Control group tries to catch up. • RESENTFUL DEMORALIZATION: • Control group performance decreases.
EXTERNAL VALIDITY • GENERALIZABILITY OF TREATMENT. • Will training work with different • People ? • Places ? • Times ? • Internal validity is a “pre-requisite” to external validity: Need to show training “works” before being concerned with generalizability of training.
THREATS TO EXTERNAL VALIDITY • REACTIVE EFFECTS: • Novelty & Hawthorne effects present early in training, absent later on. • PRETEST SENSITIZATION: • Use pretest in one group, not another. • INTERACTION OF SELECTION & TREATMENT: • Example: younger trainees do well with computers; training may be less effective with older trainees. • MULTIPLE TREATMENT INTERFERENCE: • Combination of training techniques critical, but not the same in future training.
Threats to External Validity (cont’d) • RANDOM SAMPLING reduces threats to external validity. • INTRA-ORGANIZATIONAL VALIDITY: • Will training work again in our organization? • Increase by re-checking needs assessment & effective evaluations. • INTER-ORGANIZATIONAL VALIDITY: • Will training work in other organizations? • Similarity of organizations & audiences key factors. • Needs assessment crucial.
RESEARCH DESIGNS:PREEXPERIMENTAL DESIGNS • Lacks Control Groups: Can’t show causality. • One-Group Posttest Only: X P • Can’t tell if change occurred. • Use if only option. • One-Group Pretest/ Posttest: P1 X P2 • Can assess if change occurred. • Can’t tell if it’s due to training. • Use: if can’t get control group.
EXPERIMENTAL DESIGNS • Uses RANDOMIZATION AND CONTROL GROUPS • PRETEST/POSTTEST CONTROL GROUP DESIGN • R P1 X P2R P1 P2 • Rigorous controls for many threats to internal validity, but not external validity: effect of pretest sensitization, diffusion of treatment, compensatory rivalry etc? • Ethical issues in using control groups; practical issues in random assignments and control groups. • SOLOMON 4-GROUP • Includes test for pretest sensitization. • Feasibility of 4 randomly assigned groups? • Can use when training entire organization.
QUASI-EXPERIMENTAL DESIGNS • Uses Intact Groups; not randomized. • NONEQUIVALENT CONTROL GROUP • P1 X P2P1 P2 • Susceptible to selection threat and interactions; pretest sensitization. • TIME SERIES DESIGN: • P1 P2 P3 P4 X P5 P6 P7 P8 • Better than one group pretest/posttest design; can check maturation effects; can’t assess history effects. • MULTIPLE TIME SERIES DESIGN: • Add a control group to above. • Best design uses random assignment. • Feasibility.
SUMMARY • USE MOST RIGOROUS DESIGN POSSIBLE. • ALL DESIGNS HAVE SOME LIMITATIONS. • ACKNOWLEDGE THREATS TO • INTERNAL VALIDITY • EXTERNAL VALIDITY • EVALUATION IS ONLY AS GOOD AS CRITERIA USED.
Exercise(Noe, 1999) • Consider this course as a training program. Identify: • 1. The types of outcomes (criteria) you would use in evaluating this course. • 2. The evaluation design you would use. • Justify your choice of a design based on minimizing threats to validity and practical considerations.