Harvey Stevens, Senior Policy Analyst Manitoba Family Services and Housing November 2003

An Approach to Using Social Assistance Administrative Data to Conduct Net Impact Analyses of Employment Training Programs Harvey Stevens, Senior Policy Analyst Manitoba Family Services and Housing November 2003

Presentation Objectives Overall • Describe how social assistance administrative data can be utilized to conduct net impact analyses of employment training programs provided to social assistance recipients.

Presentation Objectives Specific 1. Describe what are net impact analyses and the type of research designs typically used to conduct them. 2. Describe the key threats to internal validity and show how each design addresses them.

Presentation Objectives Specific 3a. Describe the ‘propensity score matching technique’ for selecting a comparison group and show how it minimizes the selection bias. 3b. Describe the ‘difference-in-difference’ estimator of program impact and show how it minimizes the selection bias.

Presentation Objectives Specific 4. Describe how a social assistance administrative data file can be used to create a matched comparison group and a difference-in-difference estimator for estimating the net impact of an employment training program.

What is a Net Impact Analysis? Definition • “A Net Impact analysis involves determining the extent to which a training intervention results in the desired outcomes.” Key Elements Include: • The determination of a program’s effectiveness by assessing the unique impact of the program.

What is a Net Impact Analysis? Gross vs. Net Impact Analyses • Gross • captures the total impact produced by all factors • measured by the score for the outcome variable for members of the program group • Net • captures the unique impact produced by the program • measured by subtracting the effect of non-program influences from the total impact

What is a Net Impact Analysis? Measuring the effect of non-program factors • The key tool for accomplishing this is the creation of a ‘counterfactual’ – that outcome which would have occurred in the absence of the program.

T - History - Selection - Contamination - Treatment / Program Intervention Unobserved factors, besides T, cause all or part of an observed change, over time. Undetected differences between program and comparison groups cause a difference in outcomes. Undetected differences in (a) how the program was implemented or (b) divergent events or maturation occurring to the program and comparison groups, causes a difference in outcomes. Non-Experimental Research Designs for Establishing Net Impact

Minimizing Selection Biases with Quasi-Experimental Comparison Groups Two Methods 1. Selecting comparison group members by matching on ‘propensity scores’ 2. Using ‘difference-in-difference’ estimators of net impact

Matching Via ‘Propensity Scores’ • “Propensity scores” are the predicted probability of being a member of the program group, based on the socio-demographic characteristics of the individual.

Matching Via ‘Propensity Scores’ How Propensity-to-Participate Scores are Generated • Logistic regression analysis is used • Participation in the program = dependent variable • Socio-demographic variables = independent variables • Predicted scores are calculated using the estimated regression equation • Odds ratio = exp(constant + (coeff. x var1) + … + (coeff. x varN)) • PPScore = Oddsratio / (1+Oddsratio)

Matching Via ‘Propensity Scores’ • Sort the Combined File of Participants + Non-Participants by the PPScore. • Participant records: • retain records matched with non-participant records • delete non-matching records • Non-participant records: • retain records matched with participant records • delete non-matching records • Resulting file: • contains only matched participants and non-participant records • eliminates the bias due to mismatching

Difference-In-Difference Estimators of Net Impact • As shown in slide 11 (Research Designs), the estimator is calculated as: • Diff-in-diff score = (Y2P– Y1P) – (Y2C – Y1C) • This difference-in-difference estimator has the beneficial property of eliminating the influence of any unobserved and fixed (over time) effects on the outcome variable, thus removing another source of selection bias. This occurs because any fixed effect will be cancelled out when it is subtracted from itself.

Using Social Assistance Data Bases to Undertake Net Impact Analyses General Considerations • Begin with the entire social assistance (SA) caseload as the population from which to select a matched comparison group. • Select a period of time in the past, for analysis, that allows for at least one year of follow-up data.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Data Requirements • Program Participants • Required : Start and end time (month and year) and a SA client number or SIN; • Participants and Nonparticipants • The following monthly data from the SA data base: • SA client number or SIN; • Gross SA entitlement (may include base benefits and continuous special needs payments); • Net earnings received, work expenses paid and earnings exemptions provided; • Socio-demographic variables which could predict participation in the training program.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 1. Trim the Sample of Nonparticipants • Exclude the following types of SA recipients from the nonparticipant file: • those types of recipients not represented at all among the participants, e.g. those living outside the areas in which the program is offered • those individuals never on social assistance during the time period covered by the analysis • those who participated in other training programs after the earliest start date of the program and those who participated in the program after the selected time period

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 2. Create the Propensity-to-Participate Scores • For program participants, calculate the value of the predictor variables as of the program start month and retain one record per participant. • For nonparticipants calculate the value of the predictor variables for each month they are on assistance and retain all of the monthly records occurring during the training period.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 2. Create the Propensity-to-Participate Scores • Take a random sample of these monthly records of nonparticipants such that the sample will not be more than 10 times larger than the sample of participants. • Append the nonparticipant to the participant file and run the logistic regression analysis described above.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 2. Create the Propensity-to-Participate Scores • Using the final regression analysis equation, calculate the predicted propensity score and assign this to all members of the participant and nonparticipant groups. • Express the PPScore as an integer with a value up to 4 or 5 digits (i.e can take on a value between 1 and 999 or between 1 and 9999). • PPScore = int((Oddsratio / (1 + Oddsratio)) x 1000)

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 3. Select the Matched Comparison Group • Append the entire nonparticipant file to the participant file and create two additional variables which will be used to sort the file: • a program start month variable (PSmth) = start month for program participants and actual month for nonparticipants; • a program ID number (PIDnum) = program participant’s ID number. For nonparticipants, PIDnum=missing value. • A nonparticipant ID number (CIDnum)

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 3. Select the Matched Comparison Group • Sort this merged file by PPScore, PSmth (in ascending order) and PIDnum (in descending order). • For each set of records having the same PPScore and PSmth values, create a variable called ‘Matchnum’ = PIDnum. Also create a common program start and end month variable= actual program start and end month.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 3. Select the Matched Comparison Group • Before deleting non-matching records, look at participant records with no matching participant records. • If there is a nonparticipant record with the same start month and a PPScore within  100 of the participant PPScore, manually assign it a ‘Matchnum’=PIDnum. • Delete records with a missing value for the Matchnum variable.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 4. Calculate the Diff-in-Diff Score • The program outcome measure should reflect the length of time on assistance and the level of employment while on assistance. • One such measure is Net SA Benefit: Gross SA Benefit + Reimbursed Work Expenses + Earnings Exempted – Net Earnings = Net SA Benefit

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 4. Calculate the Diff-in-Diff Score • Create a separate file of participants and matched comparison group records. • Ensure both files have 3 common variables: • Matchnum • Program Start Month • End Month • Using the monthly data, calculate: • the Total Value of the Outcome Variable for the pre- and post- periods of time • the Pre-Post Difference = (Post - Pre)

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 4. Calculate the Diff-in-Diff Score • If there are multiple comparison records with the same ‘Matchnum’ value, then calculate the average value of the Pre-Post Difference =  (Y2c - Y1c) / n • This will result in one record per distinct value of the ‘Matchnum’ variable. • Merge the participant and nonparticipant files by ‘Matchnum’.

Using Social Assistance Data Bases to Undertake Net Impact Analyses Key Steps in Conducting the Analysis 4. Calculate the Diff-in-Diff Score • For each record, calculate the Diff-in-Diff Score as: (Y2P - Y1P) -  (Y2C - Y1C) / n • Then take the average value of the Diff-in-Diff Score = Net Impact of the Program. • A negative value will indicate that the program has led to a greater reduction in SA dependency than would have occurred otherwise = positive outcome.

24 Month Pre-Post Results for the Average Change in Net EIA Benefits Paid to Families Who Did and Did Not Take an Education or Employment Training Intervention While on EIA * not significant at the 0.05 level ** calculated as: 1.96[(Variance/n)treatment + (Variance/n)comparison]½

Thank you Harvey Stevens, Senior Policy Analyst Manitoba Family Services and Housing November 2003

Harvey Stevens, Senior Policy Analyst Manitoba Family Services and Housing November 2003