250 likes | 413 Views
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING. Submitted By . Osama Ghulam Mohammad. (2010-CS-20) Noureen Chagani (2010-CS-11)
E N D
Department of Computer Science Sir Syed University of Engineering &Technology, Karachi-Pakistan. Presentation Title:DATA MINING Submitted By Osama Ghulam Mohammad. (2010-CS-20) Noureen Chagani (2010-CS-11) NaveedUsman (2010-CS-23)
TABLE OF CONTENTS • What is data mining ? • Data mining consists of five major elements • Why Mine Data? • Commercial Viewpoint • Scientific Viewpoint • Some of the techniques used for data mining
What is data mining ? • Data Mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. • It is the process of extraction of knowledge from large datasets. • Extremely large datasets. • Useful knowledge that can improve processes.
Data mining consists of five major elements: • Extract, transform, and load transaction data onto the data warehouse system. • Store and manage the data in a multidimensional database system. • Provide data access to business analysts and information technology professionals. • Analyze the data by application software. • Present the data in a useful format, such as a graph or table.
Why Mine Data? Commercial Viewpoint • Lots of data is being collected and warehoused • Web data, e-commerce • purchases at department/grocery stores • Bank/Credit Card transactions • Computers have become cheaper and more powerful • Competitive Pressure is Strong • Provide better, customized services for an edge (e.g. in Customer Relationship Management)
Why Mine Data? Scientific Viewpoint • Data collected and stored at enormous speeds (GB/hour). • remote sensors on a satellite • telescopes scanning the skies • microarrays generating gene expression data • scientific simulations generating terabytes of data • Traditional techniques infeasible for raw data. • Data mining may help scientists . • in classifying and segmenting data
Some of the techniques used for data mining are: • Artificial neural networks - Neural networks are useful for pattern recognition or data classification, through a learning process. Non-linear predictive models that learn through training and resemble biological neural networks in structure.
Neural Network • Neural Networks map a set of input-nodes to a set of output-nodes • Number of inputs/outputs is variable • The Network itself is composed of an arbitrary number of nodes with an arbitrary topology
Decision tree • Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset.
Decision tree (data) height hair eyes class short blond blue A tall blond brown B tall red blue A short dark blue B tall dark blue B tall blond blue A tall dark brown B short blond brown B
hair dark blond red B A eyes blue brown A B
The Nearest neighborhood method A classification technique that classifies each record based on the records most similar to it in an historical database.
An important technique for Data Mining is: CLUSTURING
Clustering : (Definition) • Clustering can be considered the most important unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. • Clustering is “the process of organizing objects into groups whose members are similar in some way”. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
Clustering The greater the similarity (or homogeneity) within a group, and the greater the difference between groups, the “better” or more distinct the clustering.
Why clustering? A few good reasons ... • Simplifications • Pattern detection
The K-Means Clustering Method Basic K-means Algorithm for finding K clusters: 1. Select K points as the initial centroids. 2. Assign all points to the closest centroid. 3. Recompute the centroid of each cluster. 4. Repeat steps 2 and 3 until the centroids don’t change.
Figure 10a shows the case when the cluster centers coincidewith the circle centers. This is a global minimum. Figure 10b shows a local minima.
“The key in business is to know something that nobody else knows.” — Aristotle Onassis“To understand is to perceive patterns.” — Sir Isaiah Berlin