1 / 10

SEG 4630 E-Commerce Data Mining — Final Review —

SEG 4630 E-Commerce Data Mining — Final Review —. Hong Cheng SEEM Chinese University of Hong Kong www.se.cuhk.edu.hk/~hcheng. Final Time/Location. Time : 9:30-11:30 am Dec. 15, Tuesday Location : 103 John Fulton Center Coverage : Chaps 2, 4-8

lemanski
Download Presentation

SEG 4630 E-Commerce Data Mining — Final Review —

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SEG 4630E-Commerce Data Mining — Final Review — Hong Cheng SEEM Chinese University of Hong Kong www.se.cuhk.edu.hk/~hcheng E-Commerce Data Mining

  2. Final Time/Location • Time: 9:30-11:30 am Dec. 15, Tuesday • Location: 103 John Fulton Center • Coverage: Chaps 2, 4-8 • You can bring two A4 size, double-sided cheat sheet • Calculator IS needed. E-Commerce Data Mining

  3. Chapter 2 (1) • Calculate data distribution • Mean, median, variance and standard deviation • Calculate distance between data objects • Minkowski distance • Distance between binary variables: symmetric and asymmetric • Cosine similarity E-Commerce Data Mining

  4. Chapter 2 (2) • Data normalization • Min-max normalization • Z-score normalization • Decimal scaling • Data reduction • Dimensionality reduction methods • Sampling E-Commerce Data Mining

  5. Chapters 4-5 (1) • Decision tree • Calculate information gain, gini index, gain ratio • Bayes theorem and Naïve Bayesian • Calculate probabilities from training datasets • Lazy classifier and k-nearest neighbor • Calculate based on different k values and different distance measures • Differences between eager and lazy classifiers E-Commerce Data Mining

  6. Chapters 4-5 (2) • Accuracy and error measures • Training error vs. validation error • Confusion matrix • ROC curve • True positive rate (TPR) and false positive rate (FPR) • Area under curve (AUC) • Evaluation methods • Hold out • Cross validation • Ensemble, bagging: know the principle E-Commerce Data Mining

  7. Chapters 6-7 (1) • Frequent patterns and association rules • Support, confidence • Generate association rules from frequent itemsets • Apriori algorithm • Candidate generation and test • Self joining • Pruning • Database scan • FPgrowth algorithm • Build FP-tree • Extract conditional DB E-Commerce Data Mining

  8. Chapter 6-7 (2) • Closed itemsets and maximal itemsets • Lift/Interest measure • Constraints • Monotonic • Antimonotonic • Convertible constraints • Sequence pattern mining: know the principle • Max-gap • min-gap • Max-span E-Commerce Data Mining

  9. Chapter 8 • K-means clustering • Algorithm and calculation • Advantages and disadvantages • Hierarchical clustering: MIN, MAX, Group average • Step-wise calculation • Update distance matrix • Advantages and disadvantages • Density-based clustering • Know the principle • Evaluating clustering quality • SSE, silhouette, entropy, purity E-Commerce Data Mining

  10. Questions? E-Commerce Data Mining

More Related