190 likes | 208 Views
This Perception-Based Classification (PBC) System, led by Salvador Ledezma, explores user-computer cooperation in data mining. It integrates human perceptivity and visualization techniques to enhance classification tasks efficiently.
E N D
Perception-Based Classification(PBC) System Salvador Ledezma sledezma@uci.edu April 25, 2002
Introduction • Concepts • Demo of PBC • References: • “Towards and Effective Cooperation of the User and Computer for Classification” • “Visual Data Mining with Pixel-oriented Visualization Techniques” • “Visual Classification: An Interactive Approach to Decision Tree Construction” • Mihael Ankerst, author or coauthor
Data Mining • Exploration and Analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules • Part of Knowledge Discovery in Databases (KDD) process
Classification • Major task of Data Mining • Assign object to one of a set of given classes based on object attributes
Classification Algorithms • Decision Tree Classifier • Training set – set of objects whose attributes and class is already known • Using training set, tree classifier determines a classification function represented by a decision tree • Model for class attribute as a function of the values of other attributes • Test set – validates the classification function
Classification (cont) • Usually algorithms are black boxes with no user interaction or intervention • Reasons for user involvement in decision tree construction: • Use human pattern recognition capabilities • User will have better understanding of tree • User provides domain knowledge
Visual Data Mining • Tackle data mining tasks by • enabling human involvement • Incorporating perceptivity of humans
Visual Classification • Construction of decision trees is decomposed into substeps • Enables human involvement • Example: PBC • Data visualization based on 2 concepts • Each attribute of training data is visualized in a separate part of screen • Different class labels of training objects are represented by different colors
Pixel-Oriented Visualization Techniques • Represent each attribute value as a single colored pixel • Map the range of possible attribute values to a fixed color map • Maximizes the amount of information represented at one time without any overlap
Circle Segments Technique • Data is a circle divided into segments • Each segment represents an attribute • Attribute values are mapped by a single colored pixel and arrangement starts in the center and proceeds outward • Example
Light = high stock price Dark = low stock price Represents 50 stocks. 1 circle represents the prices of different stocks at the same time
Bar Visualization • For each attribute • Attribute values are sorted into attribute lists • Classes are defined by colors • Within a bar, sorted attribute values are mapped to pixels, line by line • Each attribute is placed in a different bar
DNA Training Data Attribute 85 and attribute 90 visually are good candidates for splitting tree Algorithm picks 90 as the optimal split
PBC • Uses pixel-oriented visualization • Visualizes training data in order to support interactive decision tree construction • Examples of use • Automatic • Automatic-manual (top 2 levels) • Manual-automatic • Manual • Actual use lies somewhere in between this spectrum
Additional Functionality • Propose split • Look-ahead • For a hypothetical split • Expand tree • Automatic expanding and construction