880 likes | 1.05k Views
Data Mining Chapter 3 Output: Knowledge Representation. Kirk Scott. A summary of ways of representing knowledge, the results of mining: Rule sets Decision trees Regression equations Clusters Deciding what kind of output you want is the first step towards picking a mining algorithm.
E N D
Data MiningChapter 3Output: Knowledge Representation Kirk Scott
A summary of ways of representing knowledge, the results of mining: • Rule sets • Decision trees • Regression equations • Clusters • Deciding what kind of output you want is the first step towards picking a mining algorithm
Output can be in the form of tables • This is kind of lame • All they’re saying is that instances can be organized to form a lookup table for classification • The contact lens data can be viewed in this way • At the end they will consider another way in which the instance set itself is pretty much the result of mining
For problems with numeric attributes you can apply statistical methods • The computer performance example was given earlier • The methods will be in more detail in chapter 4 • The statistical approach can be illustrated graphically
Fitting a Line • This would be a linear equation relating cache size to computer performance • PRP = 37.06 + 2.47 CACH • This defines the straight line that best fits the instances in the data set • Figure 3.1, on the following overhead, shows both the data points and the line
Finding a Boundary • A different technique will find a linear decision boundary • This linear equation in petal length and petal width will separate instances of Iris setosa and Iris versicolor • 2.0 – 0.5 PETAL_LENGTH – 0.8 PETAL_WIDTH = 0
An instance of Iris setosa should give a value >0 (above/to the right of the line) and an instance of Iris versicolor should give a value <0 • Figure 3.2, on the following overhead, shows the boundary line and the instances of the two kinds of Iris
The book summarizes the different kinds of decisions (< , =, etc.) that might be coded for a single attribute at each node in a decision tree • Most are straightforward and don’t need to be repeated here • Several more noteworthy aspects will be addressed on the following overheads
Null Values • If nulls occur, you will have to make a decision based on them in any case • The occurrence of a null value may be one of the separate branches out of a decision tree node • At this point the value of assigning a meaning to null becomes apparent (not available, not applicable, not important…)
Approaches to dealing with uncoded/undistinguished nulls: • Keep track of the number of instances per branch and classify nulls with the most popular branch • Alternatively, keep track of the relative frequency of different branches • In the aggregate results, assign a corresponding proportion of the nulls to the different branches
Other Kinds of Comparisons • Simple decisions compare attribute values and constants • Some decisions may compare two attributes in the same instance • Some decisions may be based on a function of >1 attribute per instance
Oblique Splits • Comparing an attribute to a constant splits data parallel to an axis • A decision function which doesn’t split parallel to an axis is called an oblique split • In effect, the boundary between the kinds of irises shown earlier is such a split
Option Nodes • A single node with alternative splits on different attributes is called an option node • Instances are classified according to each split and may appear in >1 leaf classification • The last part of analysis includes deciding what such results indicate
Weka and Hand-Made Decision Trees • The book suggests that you can get a handle on decision trees by making one yourself • The book illustrates how Weka includes tools for doing this • To me this seems out of place until chapter 11 when Weka is introduced • I will not cover it here
Regression Trees • For a problem with numeric attributes it’s possible to devise a tree-like classifier • Working from the bottom up: • The leaves contain the performance prediction • The prediction is the average of the performance of all instances that end up classified in that leaf • The internal nodes contain numeric comparisons of attribute values
Model Trees • A model tree is a hybrid of a decision tree and regression • In a model tree instances are classified into a given leaf • Once a classification reaches the leaf, the prediction is made by applying a linear equation to some subset of instance attribute values
Figure 3.4, on the following overhead, showns (a) a linear model, (b) a regression tree, and (c) a model tree
Rule Sets from Trees • Given a decision tree, you can generate a corresponding set of rules • Start at the root and trace the path to each leaf, recording the conditions at each node • The rules in such a set are independent • Each covers a separate case
The rules don’t have to be applied in a particular order • The downside is such a rule set is more complex than an ordered set • It is possible to prune a set derived from a tree to remove redundancy
Trees from Rule Sets • Given a rule set, you can generate a decision tree • Now we’re interested in going in the opposite direction • Even a relatively simple rule set can lead to a messy tree
A rule set may compactly represent a limited number of explicitly known cases • The other cases may be implicit in the rule set • The implicit cases have to be spelled out in the tree
An Example • Take these rules for example: • If a and b then x • If c and d then x • The result is implicitly binary, either x or not x • The other variables are also implicitly binary (T or F)
With 4 variables, a, b, c, and d, there can be up to 4 levels in the tree • A tree for this problem is shown in Figure 3.5 on the following overhead
Messiness = Replicated Subtrees • The tree is messy because it contains replicated subtrees • If a = yes and b = no, you then have to test c and d • If a = no, you have to do exactly the same test on c and d • The gray leaves in the middle and the gray leaves on the right both descend from analogous branches of the tree
The book states that “decision trees cannot easily express the disjunction implied among the different rules in a set.” • Translation: • One rule deals with a and b • The other rule is disjoint from the first rule; it deals only with b and c • As seen above, for “no” for each of a and b you have to do the same test on c and d
Another Example of Replicated Subtrees • Figure 3.6, on the following overhead, illustrates an exclusive or (XOR) function
Consider the graph: • (x = 1) XOR (y = 1) a • Incidentally, note that you could also write: • (x <> y) a, (x = y) b
Now consider the tree: • There’s nothing surprising: First test x, then test y • The gray leaves on the left and the right at the bottom are analogous • Now consider the rule set: • In this example the rule set is not simpler • This doesn’t negate the fact that the tree has replication
Yet Another Example of a Replicated Subtree • Consider Figure 3.7, shown on the following overhead
In this example there are again 4 attributes • This time they are 3-valued instead of binary • There are 2 disjoint rules, each including 2 of the variables • There is a default rule for all other cases
The replication is represented in the diagram in this way: • Each gray triangle stands for an instance of the complete subtree on the lower left which is shown in gray
The rule set would be equally complex IF there were a rule for each branch of the tree • It is less complex in this example because of the default rule
Other Issues with Rule Sets • We have not seen the data mining algorithms yet, but some do not generate rule sets in a way analogous to reading all of the cases off of a decision tree • Sets (especially those not designed to be applied in a given order) may contain conflicting rules that classify specific cases into different categories
Rule Sets that Produce Multiple Classifications • In practice you can take two approaches • Do not classify instances that fall into >1 category • Count how many times each rule is triggered by a training set and use the most popular of the classification rules when two conflict
Rule Sets that Don’t Classify Certain Cases • If a rule set doesn’t classify certain cases, there are again two alternatives: • Do not classify those instances • Classify those instances with the most frequently occurring instances
The Simplest Case with Rule Sets • Suppose all variables are Boolean • I.e., suppose rules only have two possible outcomes, T/F • Suppose only rules with T outcomes are expressed • (By definition, all unexpressed cases are F)
Under the foregoing assumptions: • The rules are independent • The order of applying the rules is immaterial • The outcome is deterministic • There is no ambiguity
Reality is More Complex • In practice, there can be ambiguity • The authors state that the assumption that there are only two cases, T/F, and only T is expressed, is a form of closed world assumption • In other words, the assumption is that everything is binary
As soon as this and any other simplifying assumptions are relaxed, things become messier • In other words, rules become dependent, the order of application matters, etc. • This is when you can arrive at multiple classifications or no classifications from a rule set
Association Rules • This subsection is largely repetition • Any subset of attributes may predict any other subset of attributes • Association rules are really just a generalization or superset of classification rules
This is because this rule is one of many association rules (all non-class attributes) (class attribute) • Because so many association rules are possible, you need criteria for defining interesting ones