180 likes | 291 Views
Goal. Predict whom survived the Titanic Disaster. Hypotheses. Woman and Children First. Get Data. Read dataset into Excel, R, etc. Data Management. Some Age Missing Data, Analyze Gender Only. Statistics & Analysis. 74% Women, 19% Men . Submit Predictions. 320 / 418 = 76.5%.
E N D
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Age All N = 891 Data N = 714 Missing N = 177
Decision Trees • Dependent variable, (Y) • Continuous • Categorical • Independent variables, (X’s) • Continuous • Categorical The Decision Tree looks for split on sample at the node that can lead to the most differentiation on Y
Decision Trees • maximize data likelihood (minimize deviance).
Prediction and Missing Values Correlation, Association of Age with other Variables?
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis 74% Women, 19% Men Submit Predictions 320 / 418 = 76.5%
Gender and Age • Tree grows based on optimizing only the split from the current node rather then optimizing the entire tree • Tree stops when further split becomes ineffective
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Some Age Missing Data, Analyze Gender Only Statistics & Analysis Submit Predictions
Goal Predict whom survived the Titanic Disaster Hypotheses Woman and Children First Get Data Read dataset into Excel, R, etc Data Management Age + Gender Statistics & Analysis Submit Predictions
Decision Trees • Popular Implementations • CART Classification And Regression Tree • CHAID CHi-squared Automatic Interaction Detector • CHAID allows multiple branch split - a wider tree • CART uses binary split