350 likes | 470 Views
Learn about the ID3 decision tree induction algorithm for machine learning, including principles, tree construction, information gain, and optimal tree selection based on credit history data. This algorithm helps in estimating an individual's credit risk using factors like credit history, debt, income, and collateral.
E N D
9.3 The ID3 Decision Tree Induction Algorithm • ID3 induces concepts from examples. • ID3 represents concepts as decision trees. • Decision tree: a representation that allows us to determine the classification of an object by testing its values for certain properties • An example problem of estimating an individual’s credit risk on the basis of credit history, current debt, collateral, and income • Table 9.1 lists a sample of individuals with known credit risks. • The decision tree of Fig. 9.13 represents the classifications in Table 9.1 Machine Learning
Data from credit history of loan applications (Table 9.1) Machine Learning
A decision tree for credit risk assessment (Fig. 9.13) Machine Learning
9.3 The ID3 Decision Tree Induction Algorithm • In a decision tree, • Each internal node represents a test on some property such as credit history or debt • Each possible value of the property corresponds to a branch of the tree such as high or low • Leaf nodes represents classifications such as low or moderate risk • An individual of unknown type may be classified by traversing the decision tree. • The size of the tree necessary to classify a given set of examples varies according to the order with which properties are tested. • Fig. 9.14 shows a tree simpler than Fig. 9.13 but the tree also classifies the examples in Table 9.1 Machine Learning
A simplified decision tree (Fig. 9.14) Machine Learning
9.3 ID3 Decision Tree Induction Algorithm • Choice of the optimal tree • measure • the greatest likelihood of correctly classifying unseen data • assumption of ID3 algorithm • “the simplest decision tree that covers all the training examples” is the optimal tree • rationale for this assumption is time-honored heuristic of preferring simplicity & avoiding unnecessary assumptions • Occam’s Razor principle • “ It is vain to do with more what can be done with less…. Entities should not be multiplied beyond necessity ” Machine Learning
9.3.1 Top-down Decision Tree Induction • ID3 algorithm • constructs decision tree in a top-down fashion • selects a property at the current node of the tree • using the property to partition the set of examples • recursively construct a subtree for each partition • Continues until all members of the partition are in the same class • Because the order of tests is critical, ID3 relies on its criteria forselecting the test • For example, ID3 constructs Fig. 9.14 from Table 9.1 • ID3 selects INCOME as the root property => Fig. 9.15 • The partition {1,4,7,11} consists entirely of high-risk and CREDIT HISTORY further devides the partition into {2,3}, {14], and {12} => Fig. 9.16 Machine Learning
Decision Tree Construction Algorithm Machine Learning
A partially constructed decision tree (Fig. 9.15) Machine Learning
Another partially constructed decision tree (Fig. 9.16) Machine Learning
9.3.2 Information Theoretic Test Selection • Test selection method • strategy • using information theory to select the test (property) • procedure • measure the information gain • pick the property providing the greatest information gain • Information gain from property P Machine Learning
9.3.2 Information Theoretic Test Selection Machine Learning
9.3.2 Information Theoretic Test Selection Because INCOME provides the greatest information gain, ID3 will select it as the root. Machine Learning
9.5 Knowledge and Learning • Similarity-based Learning • generalization is a function of similarities across training examples • biases are limited to syntactic constraints on the form of learned knowledge • Knowledge-based Learning • the need of prior knowledge • the most effective learning occurs when the learner already has considerable knowledge of the domain • argument for the importance of knowledge • similarity-based learning techniques rely on relatively large amount of training data. In contrast, humans can form reliable generalizations from as few as a single training instance. • any set of training examples can support an unlimited number of generalizations, most of which are irrelevant or nonsensical. Machine Learning
9.5.2 Explanation-Based Learning • EBL • use an explicitly represented domain theory to construct an explanations of a training example • By generalizing from the explanation of the instance, EBL • filter noise • select relevant aspects of experience, and • organize training data into a systematic and coherent structure Machine Learning
9.5.2 Explanation-Based Learning • Given • A target concept • general specification of a goal state • A training example • an instance of the target • A domain theory • a set of rules and facts that are used to explain how the training example is an instance of the goal concept • Operationality criteria • some means of describing the form that concept definitions may take • Determine • A new schema that achieves target concept in a general way Machine Learning
9.5.2 Explanation-Based Learning • Example • target concept : a rule used to infer whether an object is a cup • premise(X) -> cup(X) • domain theory • liftable(X) ^ holds_liquid(X) -> cup(X) • part(Z, W) ^ concave(W) ^ points_up(W) -> holds_liquid(Z) • light(Y) ^ part(Y, handle) -> liftable(Y) • small(A) -> light(A) . made_of(A, feathers) -> light(A) • training example : an instance of the goal concept • cup(obj1) , small(obj1), part(obj1, handle), owns(bob, obj1), part(obj1, bottom), part(obj1, bowl), points_up(bowl), concave(bowl), color(obj1, red) • operationality criteria • Target concepts must be defined in terms of observable, structural properties such as part and points_up Machine Learning
9.5.2 Explanation-Based Learning • Algorithm • construct an explanation of why the example is indeed an instance of the training concept (Fig. 9.17) • proof that the target concept logically follows from the example • eliminates irrelevant concepts and captures relevant concepts to the goal such as color(obj1, red) • generalize the explanation to produce a concept definition • by substituting variables for constants that are part of the training instance while retaining those constants and constraints that are part of the domain theory • EBL defines a new rule whose • conclusion is the root of the tree • premise is the conjunction of the leaves small(X) ^ part(X,handle) ^ part(X,W) ^ concave(W) ^ points_up(W) -> cup(X) Machine Learning
Proof that an object , X, is a cup (Fig. 9.17) Machine Learning
9.5.2 Explanation-Based Learning • Benefits of EBL • select the relevant aspects of the training instance using the domain theory • form generalizations relevant to specific goals and that are guaranteed to be logically consistent with the domain theory • learning from single instance • hypothesize unstated relationships between its goals and its experience by constructing an explanation Machine Learning
9.5.3 EBL and Knowledge-Level Learning • Issues in EBL • Objection • EBL cannot make the leaner do anything new • EBL only learn rules within the deductive closure of its existing theory • sole function of training instance is to focus the theorem prover on relevant aspects of the problem domain • Viewed as a form of speed up learning or knowledge base reformation • Responses to this objection • Takes information implicit in a set of rules and makes it explicit • E.g.) chess game • to focus on techniques for refining incomplete theories • development of heuristics for reasoning with imperfect theories, etc. • to focus on integrating EBL and SBL. • EBL refine training data where the theory applies • SBL further generalize the partially generalized data Machine Learning
9.6 Unsupervised Learning • Supervised vs Unsupervised learning • supervised learning • the existence of a teacher, fitness function, some other external method of classifying training instances • unsupervised learning • eliminates the teacher • learner form and evaluate concepts on its own • The best example of unsupervised learning is human • Propose hypotheses to explain observations • Evaluate their hypotheses using such criteria as simplicity, generality, and elegance • Test hypotheses through experiments of their own design Machine Learning
9.6.2 Conceptual Clustering • Given • a collection of unclassified objects • some means of measuring the similarity of objects • Goal • organizing the objects into a classes that meet some standard of quality, such as maximizing the similarity of objects in a class • Numeric taxonomy • The oldest approach to the clustering problem • Represent a object as a collection of features (vector of n feature values) • similarity metric : the euclidean distance between objects • Build clusters in a bottom-up fashion Machine Learning
9.6.2 Conceptual Clustering • Agglomerative Clustering Algorithm • step 1 • examine all pairs of objects • select the pair with highest degree of similarity • make the pair a cluster • step 2 • define the features of the cluster as some function of the features of the component members • replace the component objects with the cluster definition • step 3 • repeat the process on the collection of objects until all objects have been reduced to a single cluster • The result of the algorithm is a binary tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size Machine Learning
Step 0 Step 1 Step 2 Step 3 Step 4 Agglomerative (Bottom-Up) a a b b a b c d e c c d e d d e e Divisive (Top-Down) Step 3 Step 2 Step 1 Step 0 Step 4 Hierarchical Clustering • Agglomerative approachvs. Divisive approach Machine Learning
Agglomerative Hierarchical Clustering • 클러스터간의 유사도를 측정하는 방법 • Single-Link • 두 클러스터간의 유사도 두 클러스터에서 서로 가장 가까운 두 데이터의 유사도 • Complete-Link • 두 클러스터간의 유사도 두 클러스터에서 서로 가장 먼 두 데이터의 유사도 • Group-Averaging • Single-Link와 Complete-Link의 “Compromise” Machine Learning
Agglomerative Hierarchical Clustering • Single-Link : For the Good Local Coherence! Machine Learning
Agglomerative Hierarchical Clustering • Complete-Link: For the Good Global Cluster Quality! Machine Learning
Agglomerative Hierarchical Clustering • Group Averaging • Not the maximum similarity of two data from each cluster • Not the minimum similarity of two data from each cluster • Average value among all the pairs of two data from each cluster!! • Efficiency? • Single-Link Group Averaging < Complete-Link Machine Learning
K-means Clustering • K-means 알고리즘 Machine Learning
K-means Clustering • K-means 알고리즘 (cont’d) Machine Learning
K-means Clustering • K-means 알고리즘 • 임의로 k 개의 시작점(클러스터)을 구한다 • 각 데이터들에 대해 k개의 시작점 중 가장 가까운 점에 해당하는 클러스터로 할당한다. • 각 시작점에 할당된 데이터를 이용하여 k개의 시작점을 다시 구한다. 만일 시작점에 변화가 없으면 클러스터링을 중지한다. • 2)번을 수행한다. Machine Learning
K-means Clustering • K-means 알고리즘의 특징 • 빠르고 구현하기 쉽다. • k개의 점을 반드시 결정해야 한다. • “중점”을 구할 수 있는 데이터 형태에만 사용 가능하다. • 부적절한 k 값을 준다면, 엉뚱한 클러스터들이 만들어지거나 클러스터링이 완료되지 않을 수도 있다. k=4 라면? Machine Learning