150 likes | 317 Views
Model Building and Validation. An overview using the discriminant analysis technique. Assumption for this lecture. There are several types of models, but this lecture assumes we are building one with a 2-valued dependent variable.
E N D
Model Building and Validation An overview using the discriminant analysis technique
Assumption for this lecture • There are several types of models, but this lecture assumes we are building one with a 2-valued dependent variable. • e.g. We want to predict who will respond to a mailing – dependent var. has two values – responders/non-responders. • e.g. Predict who is at risk for a heart attack – dependent variable is – had a heart attack/did not have a heart attack
What will it tell us? • The model is built using past data to generate a score to predict the likelihood of something occurring or not. • (What is the probability that this person will respond to the mailing?)
The Modeling Process • Sample Design • Data Collection and Cleaning • Sample selection • Data aggregation • Build Model • Test the Model
Sample Design • What data do you need? • Where is it? • How much is needed? • What is the dependent variable?
Data Collection and Cleaning • Read, validate data • Deal with Missing values • Delete unwanted records and variables.
Selecting a sample • Choose a sample to analyze. • For 0/1 regression (discriminant analysis equivalent) use approximately equal records of each type. • Select twice the number you need to build the model, so you can set aside 50% of the data for validation.
Data Aggregation • Data from multiple sources merged • This may occur as a first step before data cleaning, depending on the situation. • New variables defined • (eg: ratio of satisfactory trades to total trades).
Model Building • Break up each independent variable into classes. Each class should have roughly 2 to 10% of the observations. • Run Crosstabs of each variable with the dependent variable. • Redefine the independent variable as multiple dummy (0/1) variables. • Run regression with the dummies.
Model Building, contd. • Eliminate variables that are not significant, until you have a model with variables that are significant and intuitively meaningful.
Testing the model • Perform Kolmogorov-Smirnov (K-S Test) to test how well the model performs on: • The analysis sample • The validation sample • The total sample • If it separates the 0 and the 1s well in each of the three cases, you have a good model.