150 likes | 275 Views
Data Mining. By : Tung, Sze Ming ( Leo ) CS 157B. Definition. A class of database application that analyze data in a database using tools which look for trends or anomalies. Data mining was invented by IBM. Purpose.
E N D
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B
Definition • A class of database application that analyze data in a database using tools which look for trends or anomalies. • Data mining was invented by IBM.
Purpose • To look for hidden patterns or previously unknown relationships among the data in a group of data that can be used to predict future behavior. • Ex: Data mining software can help retail companies find customers with common interests.
Background Information • Many of the techniques used by today's data mining tools have been around for many years, having originated in the artificial intelligence research of the 1980s and early 1990s. • Data Mining tools are only now being applied to large-scale database systems.
The Need for Data Mining • The amount of raw data stored in corporate data warehouses is growing rapidly. • There is too much data and complexity that might be relevant to a specific problem. • Data mining promises to bridge the analytical gap by giving knowledgeworkers the tools to navigate this complex analytical space.
The Need for Data Mining, cont’ • The need for information has resulted in the proliferation of data warehouses that integrate information multiple sources to support decision making. • Often include data from external sources, such as customer demographics and household information.
Approach to Data Mining • association • sequence-based analysis • clustering • classification
Association • Classic market-basket analysis, which treats the purchase of a number of items (for example, the contents of a shopping basket) as a single transaction. • This information can be used to adjust inventories, modify floor or shelf layouts, or introduce targeted promotional activities to increase overall sales or move specific products. • Example : 80 percent of all transactions in which beer was purchased also included potato chips.
Sequence-based analysis • Traditional market-basket analysis deals with a collection of items as part of a point-in-time transaction. • to identify a typical set of purchases that might predict the subsequent purchase of a specific item.
Clustering • Clustering approach address segmentation problems. • These approaches assign records with a large number of attributes into a relatively small set of groups or "segments." • Example : Buying habits of multiple population segments might be compared to determine which segments to target for a new sales campaign.
Classification • Most commonly applied data mining technique • Algorithm uses preclassified examples to determine the set of parameters required for proper discrimination. • Example : A classifier derived from the Classification approach is capable of identifying risky loans, could be used to aid in the decision of whether to grant a loan to an individual.
Issues of Data Mining • Present-day tools are strong but require significant expertise to implement effectively. • Issues of Data Mining • Susceptibility to "dirty" or irrelevant data. • Inability to "explain" results in human terms.
Issues • susceptibility to "dirty" or irrelevant data • Data mining tools of today simply take everything they are given as factual and draw the resulting conclusions. • Users must take the necessary precautions to ensure that the data being analyzed is "clean."
Issues, cont’ • inability to "explain" results in human terms • Many of the tools employed in data mining analysis use complex mathematical algorithms that are not easily mapped into human terms. • what good does the information do if you don’t understand it?