100 likes | 291 Views
Goal. Predict whom survived the Titanic Disaster. Hypotheses. Get Data. Data Management. Statistics & Analysis. Correctly Predict Passenger’s Fate . Submit Predictions. Score = . Number of Passengers in Test Dataset. Training and Test Data. Training Data. Develop Model.
E N D
Goal Predict whom survived the Titanic Disaster Hypotheses Get Data Data Management Statistics & Analysis Correctly Predict Passenger’s Fate Submit Predictions Score = Number of Passengers in Test Dataset
Training and Test Data Training Data Develop Model Test Data N=891 39% Survived N=418 All Titanic Passengers N= 2,223 • How similar is the Test Data to the Training Data? • If Similar, then model should do well. • If Differenet, then model could perform poorly.
Kitchen Sink Over-Fitting?
Decision Tree Pruning model.6 <- rpart(survived ~ sex + age + pclass + sibsp + parch + fare + embarked, data = train_data, maxdepth=2)
Confusion Matrix RandomForest Decision Tree Gender False Negatives False Positives
Model Ceiling Seems Realistic 340 320 418 Gender Model
Why a Model Ceiling? Below are 4 pairs of passengers with very similar Predictor Variables; Yet, within each pair, one survived, and the other did not. At some point there just isn’t the data / variable to help make an accurate prediction.