100 likes | 116 Views
SEG 4630 E-Commerce Data Mining — Final Review —. Hong Cheng SEEM Chinese University of Hong Kong www.se.cuhk.edu.hk/~hcheng. Final Time/Location. Time : 9:30-11:30 am Dec. 15, Tuesday Location : 103 John Fulton Center Coverage : Chaps 2, 4-8
E N D
SEG 4630E-Commerce Data Mining — Final Review — Hong Cheng SEEM Chinese University of Hong Kong www.se.cuhk.edu.hk/~hcheng E-Commerce Data Mining
Final Time/Location • Time: 9:30-11:30 am Dec. 15, Tuesday • Location: 103 John Fulton Center • Coverage: Chaps 2, 4-8 • You can bring two A4 size, double-sided cheat sheet • Calculator IS needed. E-Commerce Data Mining
Chapter 2 (1) • Calculate data distribution • Mean, median, variance and standard deviation • Calculate distance between data objects • Minkowski distance • Distance between binary variables: symmetric and asymmetric • Cosine similarity E-Commerce Data Mining
Chapter 2 (2) • Data normalization • Min-max normalization • Z-score normalization • Decimal scaling • Data reduction • Dimensionality reduction methods • Sampling E-Commerce Data Mining
Chapters 4-5 (1) • Decision tree • Calculate information gain, gini index, gain ratio • Bayes theorem and Naïve Bayesian • Calculate probabilities from training datasets • Lazy classifier and k-nearest neighbor • Calculate based on different k values and different distance measures • Differences between eager and lazy classifiers E-Commerce Data Mining
Chapters 4-5 (2) • Accuracy and error measures • Training error vs. validation error • Confusion matrix • ROC curve • True positive rate (TPR) and false positive rate (FPR) • Area under curve (AUC) • Evaluation methods • Hold out • Cross validation • Ensemble, bagging: know the principle E-Commerce Data Mining
Chapters 6-7 (1) • Frequent patterns and association rules • Support, confidence • Generate association rules from frequent itemsets • Apriori algorithm • Candidate generation and test • Self joining • Pruning • Database scan • FPgrowth algorithm • Build FP-tree • Extract conditional DB E-Commerce Data Mining
Chapter 6-7 (2) • Closed itemsets and maximal itemsets • Lift/Interest measure • Constraints • Monotonic • Antimonotonic • Convertible constraints • Sequence pattern mining: know the principle • Max-gap • min-gap • Max-span E-Commerce Data Mining
Chapter 8 • K-means clustering • Algorithm and calculation • Advantages and disadvantages • Hierarchical clustering: MIN, MAX, Group average • Step-wise calculation • Update distance matrix • Advantages and disadvantages • Density-based clustering • Know the principle • Evaluating clustering quality • SSE, silhouette, entropy, purity E-Commerce Data Mining
Questions? E-Commerce Data Mining