40 likes | 203 Views
CSC 478 Programming Data Mining Applications Course Summary. Bamshad Mobasher DePaul University. What we did. Data Mining Overview The KDD Process Data Preprocessing and Understanding Using Python and Numpy Using Scikit -learn modules
E N D
CSC 478Programming Data Mining ApplicationsCourse Summary Bamshad Mobasher DePaul University
What we did • Data Mining Overview • The KDD Process • Data Preprocessing and Understanding • Using Python and Numpy • Using Scikit-learn modules • Some emphasis on visualizing and understanding characteristics of the data • Supervised Knowledge Discovery • Regression Analysis • Classification • Techniques such as KNN, Ridge Regression, Decision Tree and Bayesian classification • Lots of emphasis on model evaluation • Evaluation metrics • Train-Test methodologies such as cross-validation
What we did • Unsupervised Knowledge Discovery • Cluster analysis • Using PCA and SVD for dimensionality reduction, data characterization, and noise reduction. • Association rule discovery • Emphasis on using unsupervised approaches as components of larger knowledge discovery efforts • E.g., using PCA before clustering; using clustering as the basis for classification • Real application domains • Text Mining and document analysis/filtering • Recommender systems • Predictive modeling for marketing/business applications • Image analysis
What we did not do(and you should learn later) • Approaches for mining sequential/temporal data • Markov models; time series analysis, sequential pattern mining • Ensemble and Hybrid Classifiers/Predictors • Combining multiple classifiers • Random Forest classifiers • AdaBoost and meta-learners • Support Vector Machines and Kernel-Based Classifiers • Topic modeling with Latent factor models • LDA Latent Dirichlet Allocation • Non-Negative Matrix Factorization