50 likes | 129 Views
CS548 – Project 4. Skyler Whorton April 12, 2012. Data Processing. Problems & Solutions Outliers e.g. New York City – crimes rates and police force Incomplete data: missing values for LEMAS attributes High dimensionality Solutions Scale attributes to standardize values
E N D
CS548 – Project 4 Skyler Whorton April 12, 2012
Data Processing • Problems & Solutions • Outliers e.g. New York City – crimes rates and police force • Incomplete data: missing values for LEMAS attributes • High dimensionality • Solutions • Scale attributes to standardize values • Fill missing values using attribute’s median value • CFS on two different target features • Five resulting data sets: • Unscaled – Guessed 125 non-conflicting attributes • Scaled – Applied [0, 1] linear transformation • Z-Score-Scaled – Applied Z-score transformation • CFS-NonViolent – NonViolent class, CFS = 9 features • CFS-Violent – Violent target, CFS = 18 features
K-Means Lowest SSE Run CFS Unscaled, Unsupervised SSE: 3062.4 CFS-NonViolent, Unsupervised K=5 SSE: 175.74
Hierarchical Clustering Unscaled - Average CFS-NonViolent - Average
EM Unscaled, LL: -509.19 CFS-NonViolent (sup), LL: -10.667