1 / 26

A Brief History of Data Mining Society

A Brief History of Data Mining Society. 1989 IJCAI Workshop on Knowledge Discovery in Databases Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991) 1991-1994 Workshops on Knowledge Discovery in Databases

loe
Download Presentation

A Brief History of Data Mining Society

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Brief History of Data Mining Society • 1989 IJCAI Workshop on Knowledge Discovery in Databases • Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W. Frawley, 1991) • 1991-1994 Workshops on Knowledge Discovery in Databases • Advances in Knowledge Discovery and Data Mining (U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996) • 1995-1998 International Conferences on Knowledge Discovery in Databases and Data Mining (KDD’95-98) • Journal of Data Mining and Knowledge Discovery (1997)

  2. A Brief History of Data Mining Society • ACM SIGKDD conferences since 1998 and SIGKDD Explorations • More conferences on data mining • PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001), (IEEE) ICDM (2001), etc. • ACM Transactions on KDD starting in 2007

  3. Conferences and Journals on Data Mining • KDD Conferences • ACM SIGKDD Int. Conf. on Knowledge Discovery in Databases and Data Mining (KDD) • SIAM Data Mining Conf. (SDM) • (IEEE) Int. Conf. on Data Mining (ICDM) • Conf. on Principles and practices of Knowledge Discovery and Data Mining (PKDD) • Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD)

  4. Where to Find References? DBLP, CiteSeer, Google • Data mining and KDD (SIGKDD: CDROM) • Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. • Journal: Data Mining and Knowledge Discovery, KDD Explorations, ACM TKDD • Bioinformatics • Conferences: RECOMB, CSB, PSB, BIBE, etc • Journals: Bioinformatics, BMC Bioinformatics, TCBB,…

  5. Top-10 Algorithm Finally Selected at ICDM’06 • #1: Decision Tree (61 votes) • #2: K-Means (60 votes) • #3: SVM (58 votes) • #4: Apriori (52 votes) • #5: EM (48 votes) • #6: PageRank (46 votes) • #7: AdaBoost (45 votes) • #8: kNN (45 votes) • #9: Naive Bayes (45 votes) • #10: CART (34 votes)

  6. Association Rules

  7. Association Rules

  8. Association Rules • support, s, probability that a transaction contains X  Y • confidence, c,conditional probability that a transaction having X also contains Y

  9. Association Rules • Let’s have an example

  10. Association Rules • T100 1,2,5 • T200 2,4 • T300 2,3 • T400 1,2,4 • T500 1,3 • T600 2,3 • T700 1,3 • T800 1,2,3,5 • T900 1,2,3

  11. Association Rules

  12. Classification

  13. Classification—A Two-Step Process • Classification • classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data • predicts categorical class labels (discrete or nominal)

  14. Classification • Typical applications • Credit approval • Target marketing • Medical diagnosis • Fraud detection • And much more

  15. Decision Tree • Decision Tree induction is the learning of decision trees from class-labeled training tuples • A decision tree is a flowchart-like tree structure, where each internal node denotes a test on an attribute • Each Branch represents an outcome of the test • Each Leafnode holds a class label

  16. Decision Tree Example

  17. Decision Tree Algorithm • Basic algorithm (a greedy algorithm) • Tree is constructed in a top-down recursive divide-and-conquer manner • At start, all the training examples are at the root • Attributes are categorical (if continuous-valued, they are discretized in advance) • Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)

  18. Attribute Selection Measure: Information Gain (ID3/C4.5) • Select the attribute with the highest information gain • Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D| • Expected information (entropy) needed to classify a tuple in D:

  19. Attribute Selection Measure: Information Gain (ID3/C4.5) • Information needed (after using A to split D into v partitions) to classify D: • Information gained by branching on attribute A

  20. Decision Tree

  21. Decision Tree • means “age <=30” has 5 out of 14 samples, with 2 yes’s and 3 no’s. • I(2,3) = -2/5 * log(2/5) – 3/5 * log(3/5)

  22. Decision Tree • Similarily, we can compute • Gain(income)=0.029 • Gain(student)=0.151 • Gain(credit_rating)=0.048 • Since “age” obtains highest information gain, we can partition the tree into:

  23. Decision Tree

  24. Decision Tree

More Related