1 / 21

Data Mining

Data Mining. Tri Nguyen. Agenda. Data Mining As Part of KDD Decision Tree Association Rules Clustering Amazon Data Mining Examples. Putting the results in practical use. Data Mining and KDD. What is Data Mining?.

aislin
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Tri Nguyen

  2. Agenda • Data Mining As Part of KDD • Decision Tree • Association Rules • Clustering • Amazon Data Mining Examples

  3. Putting the results in practical use Data Mining and KDD

  4. What is Data Mining? • “the automated extraction of hidden predictive information from large databases” • Algorithms produce patterns, rules • Predict future trends/behavior • Used to make business decisions

  5. Classification • Items belong to classes • Given past items’ classification, predict class of new item • Example: Issuing credit cards • Use information: income, educational background, age, current debts • Credit worthiness: Bad, good, excellent

  6. Decision Tree Classifiers • Internal Node has predicate • Leaf node is class • To classify instance • Start at root node • Traverse tree until reach leaf node • Each internal node, make decision

  7. Credit Risk Decision Tree

  8. Decision Tree Construction • Some Definitions • Purity: > # instances of each leaf belonging to only 1 class means > purity • Best Split: split giving the maximum information gain ratio (info gain/info content) • Choose attribute and condition resulting in maximum purity

  9. Decision Tree Construction

  10. Association Rules • antecedent  consequent • if  then • beer  diaper (Walmart) • economy bad  higher unemployment • Higher unemployment  higher unemployment benefits cost • Rules associated with population, support, confidence

  11. Association Rules • Population: instances such as grocery store purchases • Support • % of population satisfying antecedent and consequent • Confidence • % consequent true when antecedent true

  12. Association Rules • Population • MS, MSA, MSB, MA, MB, BA • M=Milk, S=Soda, A=Apple, B=beer • Support (MS)= 3/6 • (MS,MSA,MSB)/(MS,MSA,MSB,MA,MB, BA) • Confidence (MS) = 3/5 • (MS, MSA, MSB) / (MS,MSA,MSB,MA,MB)

  13. Clustering • “The process of dividing a dataset into mutually exclusive groups such that the members of each group are as "close" as possible to one another, and different groups are as "far" as possible from one another, where distance is measured with respect to all available variables.”

  14. Clustering • Birch Algorithm • points inserted into multidimensional tree • items guided to leaf nodes "near" representative internal nodes • nearby points clustered into one leaf node

  15. Clustering • Example of Clustering • predict what new movies a person is interested in • 1) a person’s past movie preferences • 2) others with similar preferences • 3) preferences of those in the pool for new movies

  16. Clustering • 1) cluster people with similar movie preferences • 2) given a new movie goer, find a cluster of similar movie goers • 3) then predict the cluster's new movie preferences

  17. Amazon Examples

  18. Amazon Examples

  19. Amazon Examples

  20. Amazon Examples

  21. References • http://www.thearling.com/text/dmwhite/dmwhite.htm • http://www.cse.ohio-state.edu/~srini/694Z/part1.ppt • http://www-aig.jpl.nasa.gov/public/kdd95/tutorials/IJCAI95-tutorial.html

More Related