220 likes | 248 Views
Practical Introduction to PARSCALE. Paul K. Crane, MD MPH Internal Medicine University of Washington. Outline. Introductory comments Getting a dataset prepared for PARSCALE Creating PARSCALE code PARSCALE output Reading theta estimates from PARSCALE Final comments.
E N D
Practical Introduction to PARSCALE Paul K. Crane, MD MPH Internal Medicine University of Washington
Outline • Introductory comments • Getting a dataset prepared for PARSCALE • Creating PARSCALE code • PARSCALE output • Reading theta estimates from PARSCALE • Final comments
Introductory comments • PARSCALE is not user friendly • Available from SSI for $250 plus $40 for the text (checked August 23, 2004) http://www.ssicentral.com/home.htm • Flexible; does many IRT applications well • Technical support (Leo Stam) is very good • This talk is not approved, sanctioned, or anything else by SSI – I have no financial relationship with SSI other than I own a copy of PARSCALE
Pre-PARSCALE • PARSCALE requires an ASCII formatted dataset • Responses to particular items need to be in specified columns • Need to deal with missing data and with 2 digit answers • PARSCALE can ignore commas or any other text
Pre-PARSCALE, slide 2 • PARSCALE is VERY unhappy with 0 categories (it doesn’t even try) • PARSCALE is pretty unhappy with small response categories (it may not converge on a solution) • Our rule of thumb is to combine categories until there are at least 20 observations in each category; this has always worked for us (where other things have not)
Ensure data will be in the same column locations • Make sure all id numbers have the same number of digits. Add a really big number (1,000,000) to all so that they are all 9 digits, for example • Make missing values an X. In STATA: change missing values to .x, then change .x to X in Word • .mvencode .=1000000000 • .mvdecode 1000000000=.x
Item screening • For each item, we need to make sure there are no empty categories, and plan ahead for future category merging • In STATA, tabulate each item, one at a time • If the item is highly skewed, don’t export it at all (All but <20 is “highly skewed”) • Category 1 2 3 4 • Number 1 3 12 3227 • If the item needs recombining, make a note of that and go ahead and export it • Category 1 2 3 4 • Number 3 5 24 3216
First step from STATA to PARSCALE • .mvdecode .=1000000000 • .mvencode 1000000000=.x • .gen newid=id+1000000 • .outfile newid item01 item02 item03 … using “D:/work/…/cogtest1.txt”, wide comma • Obviously omit items with too little variance from the item list
Second step from STATA to PARSCALE • Open the file in Word • Replace .x with X • Replace ,10 with ,A • ,11 with ,B • Etc. • The text file should now look square to your eye
Third and final step from STATA to PARSCALE • Missing data: go to the first line of your text file, and hit return. • First line should read: • NPKEY X,X,X,X,X,X,X,X,X,X,X,X,X,X… • Have to specify to PARSCALE that this file is where to look for the missing value code • The X’s should look like your other lines – same number of X’s as items (1 per block) • Save the dataset as a text only file with linebreaks in your PARSCALE directory
PARSCALE code – slide 1 • Open up PARSCALE and open a new document (or, better, open an old document and modify) • TITLE: need a title, needs to be 2 lines. • 2nd line of title. No need for semicolon. • >COMMENT if you want. Needs semicolon; • >FILES DFNAME=‘cogtest1.txt', NFNAME=‘cogtest1.txt', SAVE; • This tells PARSCALE where the data file is and where the missing data key is. The SAVE command tells it you want to save some output
PARSCALE code – slide 2 • >SAVE PARM=‘cogtest1.par', FIT=‘cogtest1.fit', SCORE=‘cogtest1.sco'; • This tells PARSCALE to save the parameters in a file (which I recommend), as well as a separate file with the theta scores (which I also recommend). • Fit statistics are less helpful so far.
PARSCALE code – slide 3 >INPut NTEst=1, LENGTH=109, NID=5, NTO=109; (5A1,37(1X,1A1)/40(1A1,1X)/32(1A1,1X)) • First line tells PARSCALE that we have 1 test, 109 items, 5 characters for ID, and 109 items in total • Second line tells PARSCALE what the data look like exactly: 5 alphanumeric characters on the first line, followed by 37 repetitions of (skip a column, read an alphanumeric character), then a line break, 40 (skip 1-read 1), line break, 32 skip 1-read 1 • REALLY picky about syntax; note the semicolons and parentheses
Advanced comments regarding line length • PARSCALE apparently wants to read up to 5 lines of data for each person • If there are more data than that, you can increase the number of characters in a line; can certainly handle 80 • So 80 characters is 40 data points (remember the commas), * 5 lines is 200 data points • Will become an issue when we are equating and/or doing intensive DIF analyses
PARSCALE code – slide 4 >TEST1 TNAme=‘3MS_001', ITEMS=(1(1)109), NBL=109, SLOPE; • This tells PARSCALE that there is 1 test, whose name is the 3MS_001, the items are numbered 1-109 by 1’s, there are 109 blocks, and start from a value of 1.0 for the slope parameters
PARSCALE code – slide 5 >BLOCK001 BNAME=(`byr’), ORI=(0,1,2), MOD=(1,1,2), NIT=1, NCAT=3; • The first block’s name is byr, it was originally coded 0,1, 2; recode the categories into 1,1, 2; there is 1 item; and there are 3 categories • Need to put longer names (>8 characters) in single quotes • Space these commands so they look pretty (first thing Leo Stam does) • Write a separate block for each item
PARSCALE code – slide 6 >BLOCK002 …>BLOCK 109 …; >CALIB GRADED, LOGISTIC, SCALE=1.7, NQPT=11, CYCLES=2000, CRIT=.001; • This tells PARSCALE to calibrate the data using the graded response model and the logistic formulation; use a value of 1.7 for the D parameter, use 11 quadrature points; iterate 2000 times; and stop when nothing changes cycle-to-cycle by more than 0.001
PARSCALE code – slide 7 >SCORE EAP; • This tells PARSCALE to use expected a posteriori scoring • Another option is SCORE MLE; which uses maximum likelihood scoring • Use EAP if you have people with perfect scores; otherwise, those people will have missing scores • If no one has a perfect score, MLE is theoretically better • That’s actually it! Not so hard, was it??
PARSCALE output • Phase 0: typing errors, syntax errors, file specification errors. Reads in 1st 2 data lines. • Phase 1: item preparation: are there empty cells? Calculations for initial slope (suppressed) and location values • Phase 2: E-M cycles, calibrating location and slopes; displays a summary of the parameters for all the items; chi squared item fit statistics • Phase 3: Scores for each person
From PARSCALE back to STATA – slide 1 • Recall we used the SAVE command to save the scores of people: • >SAVE PARM=‘cogtest1.par', FIT=‘cogtest1.fit', SCORE=‘cogtest1.sco'; • Within an empty session of STATA, try to open up that file (it won’t let you) • .infile newid _skip(11) theta setheta using …/cogtest1.sco • Copy the whole file name from your error statement and paste it into the command line
From PARSCALE back to STATA – slide 2 • Enter the data editor in STATA, and copy newid001, theta001, and setheta001 • Paste these variables into the data editor of your master file with your covariates • Make sure the newid’s are the same • .corr newid newid001 • .drop newid001 • Standardize theta • .gen cogscore001 = int((15*theta001)+100) • Standardize setheta001 • .gen cogerror001 = (15*setheta001)
Final comments • It looks more obscure than it is • I think these tricks and tools are foolproof • At least several colleagues have been able to code in PARSCALE using these outlines • There are other tricks • Can use saved parameters to score new people • Can use other IRT models, including Rasch / Rating Scale or Partial Credit models • Customer support is good, if all else fails • I hesitate to do this: pcrane@u.washington.edu