190 likes | 198 Views
Learn about data mining, the process of discovering patterns and rules from vast amounts of data. Explore the goals of data mining, including prediction, identification, classification, and optimization. Discover association rules, classification algorithms, clustering, and various approaches to other data mining problems. Understand the applications of data mining in marketing, finance, manufacturing, healthcare, and more. Explore commercial data mining tools and the potential of this field. (476 characters)
E N D
Chapter 27 Data Mining Concepts
Overview of Data Mining Technology • Data Mining aka Knowledge Discovery in Databases (KDD) • Discovery of new information in terms of patterns or rules from vast amounts of data • Must be carried out efficiently on large files and databases
Goals of Data Mining • Prediction • Show how certain attributes will behave in future • Identification • Identify existance of an item • Classification • Partition data into different categories • Optimization • Limited resources such as time, space, money
Association Rules • Market-Basket Model, Support, and Confidence • Apriori Algorithm • Sampling Algorithm • Frequent-Pattern Tree Algorithm • Partition Algorithm • Other Types of Association Rules • Additional Considerations for Association Rules
Classification • The process of learning a model that describes different classes of data. • The classes are known in advance – the rules that describe them are not. • Mining can help determine past influential characteristics that can be used to predict future behavior.
FIGURE 27.5Example decision tree for credit card applications.
FIGURE 27.6Sample training data for classification algorithm.
FIGURE 27.7Decision tree based on sample training data where the leaf nodes are represented by a set of RIDs of the partitioned records.
Clustering • Another way of learning • Puts “similar” records into groups • Reaction to medication • Similarity function is key
FIGURE 27.8Sample 2-dimensional records for clustering example (the RID column is not considered).
Approaches to Other Data Mining Problems • Discovery of Sequential Patterns • Discovery of Patterns in Time Series • Regression • Neural Networks • Genetic Algorithm
Applications of Data Mining • Marketing • Finance • Manufacturing • Health Care • Probably many other decision-making contexts
Commercial Data Mining Tools • Text lists several packages and their strengths • Huge field as databases multiply • Big potential if you can come up with a way of protecting privacy as well as correcting data.
Summary • Lots of potential in this field • Seems complex, but only because of the sheer amount of data. • See Wikipedia at • http://en.wikipedia.org/wiki/Data_mining