Discriminant Analysis

Discriminant Analysis • Useful to classify a sampling unit in one or other group. • The discriminant function is a linear combination of several predictor variables. • The discriminant function maximizes the between-group variation and minimize the within-group variation.

A test on discriminant function • When the value of the discriminant model is significant, we reject Ho: the group means are equal.

What else we get out of discrimiant analysis? • Which predictors are different? • Which groups are different?

When is Discriminant Analysis useful? • If you want to do profile analysis of the sampling units, use it. • When you want to predict “bank failures”, use it. • When you want assess who are the credit risk customers. • When you want to screen out the susceptible women for breast cancer for example. • This idea increases the correct classifications.

How does it work? • In Seven stages.

Stage 1: objectives • Determine the statistically different groups. • Identify severely causing predictors. • Establish classifying rules. • Develop the discriminant functions.

Stage 2: Designing the analysis • The “response variable” must be categorical. • Categories must be mutually exclusive. • Decide on • Number of categories, • Predictor variables, • Sample size [using a rule of thumb: sample size > 20 times the number of predictors]; at least 20 observations should be in each category. • Divide the data into two segments; Build the discriminant model using one segment and validate it using the other segment.

Stage 3: Check out the validity of the assumptions. • At least two groups for the “response variable”. • Data follow multivariate normality. • The unknown covariances of the groups should be equal [using Box’s M test]. • Apply remedies for any violation of the assumption: • Increase sample size, • Transform the data, • Consider quadratic rather than linear discriminant function, • Eliminate multicollinearity among predictors, • Remove outliers or overly influencing observations.

Stage 4: Estimate the model and assess its fit. • Use either simultaneous [which uses all predictors even if some are weak] or stepwise [uses only the best predictors] method of estimation. • Use one of the criteria to assess the fit: • Wilks’ lambda • Hotelling’s trace, • Pillai’s score, • Roy’s greatest eigenvalue, • Mahalanobi’s distance, • Rao’s V measure • Press’s Q statistic based on # correct classifications [works well for large sample].

Stage 5: Interpretation • Interpret the “best” predictor for classification using: • Discriminant weights, • Discriminant loadings [=correlation between the discriminant function and a predictor], • Larger F values signifying the greater discriminating ability. • Like in factor analysis, use rotation of the axes on several discriminant functions. • Use “potency index” to identify which predictor discriminates better in several discriminant functions. • Longer the “vector” from origin to a discriminant loading, the predictor is more important.

Stage 6: Validation methods • Splicing the data into 2 segments. • Collect new data and see how the discriminant function performs in there. • Profile the groups with several new non-considered predictors.

What follow? • An example, • Comments, • Questions. • Thank you!!

Discriminant Analysis