70 likes | 244 Views
Steps to Performing a Cluster Analysis. Rod Funk Chestnut Health Systems Bloomington, IL. Performing a Cluster Analysis. First step is deciding on what variables you want to cluster on Data can be continuous, counts or dichotomous
E N D
Steps to Performing a Cluster Analysis Rod Funk Chestnut Health Systems Bloomington, IL
Performing a Cluster Analysis • First step is deciding on what variables you want to cluster on • Data can be continuous, counts or dichotomous • Are the variables at one time point or are you wanting to look at trajectories across time • If across time, data will need to be in horizontal format: one row per adolescent • We name variables by time with a suffix for wave; _0, for intake, _3 for 3 months (i.e. dcs_0, dcs_3, dcs_6, etc.) • Cluster analysis also expects there to be data for every variable used in the analysis. If you are missing just one variable for a record, no clusters will be calculated for that record.
Handling Missing Data • Scale Level: In creating a scale that has shown good internal consistency (alpha>.7) we calculate using the average of answers as long was they have 3 valid answers: • Compute dcs=rnd(mean.3(l3a15d,l3a16d,l3a17d,l3a18d,l3a19d)*5). • Item level: random replacement of missing values • sort cases by loc xchk1. • rmv ms2w=median(s2w,2). • compute ms2w=rnd(ms2w). • This replaces a missing S2w with the median of the 4 surrounding cases
Handling Missing Data • Replacement of variables across time • For scales where items not asked: • Use regression on scale using other items in cluster at that wave along with the intake and last wave values • For missing a wave of data: As long as it is not the first or last wave, interpolate using the average of the two surrounding waves.
Running the Cluster Analysis • Sample syntax • CLUSTER Zpci_0 Zrpci_3 Zrpci_6 Zrpci_9 Zrpci_12 Zpci_30 Zici_0 Zrici_3 Zrici_6 Zrici_9 Zrici_12 ZSco01 Zmdci_0 Zrdci_3 Zrdci_6 Zrdci_9 Zrdci_12 ZSco02 Zl3v_0 Zrl3d_3 Zrl3d_6 Zrl3d_9 Zrl3d_12 Zl3d_30 Zl3w_0 Zrl3e_3 Zrl3e_6 Zrl3e_9 Zrl3e_12 Zl3e_30 Zmaxce_0 Zrmaxce_3 Zrmaxce_6 Zrmaxce_9 Zrmaxce_12 Zmaxce_30 • /METHOD WARD • /MEASURE= SEUCLID • /PRINT SCHEDULE • /PLOTS NONE • /SAVE CLUSTER(2,12) .
Demonstration • Purpose • To Show how to take the results of the cluster and create a table and figures for validating and deciding on the proper number of clusters. • Will cover pivot tables in SPSS output, pasting into Excel and graphing in Excel