450 likes | 639 Views
Introduction to Machine Learning and Knowledge Representation. Florian Gyarfas COMP 790-072 (Robotics). Outline. Introduction, definitions Common knowledge representations Types of learning Deductive Learning (rules of inference) Explanation-based learning
E N D
Introduction to Machine Learning and Knowledge Representation Florian Gyarfas COMP 790-072 (Robotics)
Outline • Introduction, definitions • Common knowledge representations • Types of learning • Deductive Learning (rules of inference) • Explanation-based learning • Inductive Learning – some approaches • Concept Learning • Decision-Tree Learning • Clustering • Summary, references
Introduction • What is Knowledge Representation? • Formalisms that represent knowledge (facts about the worlds) and mechanisms to manipulate such knowledge (for example, derive new facts from existing knowledge) • What is Machine Learning? • Mitchell: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” • Example: T: playing checkers P: percent of games won against opponents E: playing practice games against itself
Common Knowledge Representations • Logic • propositional • predicate • other • Structured Knowledge Representations: • Frames • Semantic nets • …
Propositional Logic • Consists of: • Constants: true/false • Set of elements called symbols, variables or atomic formulas (typically letters: a,b,c,…) • Operators (, , , , ) • Axioms • Examples: • a b a • a a true • Inference rule(s) • Modus ponens
Predicate logic • In many cases, propositional logic is too weak. • For example, how can we express something like this in propositional logic: • Every person is mortal.Tom is a person.Tom is mortal. • How can we represent these sentences in such a way that we can infer the third sentence from the first two? • Need quantifiers and predicates… • x: Person(x) Mortal(x)Person(Tom) • We can infer: Mortal(Tom) • In addition to propositional logic, predicate logic has: • Functions • Predicates • Quantifiers (,) • More axioms • One more inference rule (Generalization)
Logic: Inference Rules • Used for deductive learning • Propositional Logic: Modus Ponens is all you need • If P, then Q.PTherefore, Q. • Meta-rule, not the same as axioms • Predicate Logic: Modus Ponens and Generalization Rule:
Structured Knowledge Representations • Semantic nets • Really just graphs that represent knowledge • Nodes represent concepts • Arcs represent binary relationships between concepts • Frames • Extension of semantic nets • Entities have attributes • Class/subclass hierarchy that supports inheritance; classes inherit attributes from superclasses
Semantic Nets • Example (E. Rich, Artificial Intelligence) Furniture is-a is-part Person Chair Seat is-a is-a owner color Me My-chair Tan is-a covering Leather Brown
Types of Learning Deductive – Inductive Supervised – Unsupervised Symbolic – Non-symbolic
Deductive vs. inductive • Deductive learning • Knowledge is deduced from existing knowledge by means of truth-preserving transformations (this is nothing more than reformulation of existing knowledge). • If the premises are true, the conclusion must be true! • Example: propositional/predicate logic with rules of inference
Deductive vs. inductive • Inductive learning • Generalization from examples • Example: “All observed crows are black” “All crows are black” • process of reasoning in which the premises of an argument are believed to support the conclusion but do not ensure it
Unsupervised vs. Supervised • Supervised Learning: • There exists a “teacher” that for each training example tells the learner how it is classified (training data consists of pairs of input vectors and desired outputs) • Reinforcement Learning: • No input/output pairs; “reward function” tells agent how good its action was • Unsupervised Learning: • No a priori output; also no reward; training data just feature vectors; the system needs to form concepts (classes) by itself
Learning approaches • Deductive/Analytical • Explanation-based learning • Inductive • Supervised: • Concept Learning • Decision-Tree Learning • Neural networks • Naive Bayes classifier • Support Vector Machines • … • Unsupervised: • Clustering • Neural networks • Expectation-Maximization • …
Explanation-based learning (EBL) • Deductive • assumes prior knowledge (“domain theory”) in addition to training examples • assumes domain theory is given as a set of horn clauses • Horn clause: Disjunction of literals with at most one positive literal • Example: NOT a OR NOT b OR NOT c OR d which is the same as (a AND b AND c) d • Tries to explain training examples using the domain theory
EBL example • Consider multiple physical objects • Which are the pairs of objects such that one can be stacked safely on the other? • Target concept: • premise(x,y) SafeToStack(x,y) • where premise is a conjunctive expression containing the variables x and y. • Domain Theory: • SafeToStack(x,y) NOT Fragile(y) • SafeToStack(x,y) Lighter(x,y) • Lighter(x,y) Weight(x,wx) AND Weight(y,wy) AND LessThan(wx,wy) • Weight(x,w) Volume(x,v) AND Density(x,d) AND Equal(w,times(v,d)) • Weight(x,5) Type(x,Table) • Fragile(x) Material(x,Glass)
EBL example (2) • Training example: • On(Obj1,Obj2) • Type(Obj1,Box) • Type(Obj2,Table) • Color(Obj1,Red) • Color(Obj2,Blue) • Volume(Obj1,2) • Density(Obj1,0.3) • Material(Obj1,Cardboard) • Material(Obj2,Wood) • SafeToStack(Obj1,Obj2)
EBL example (3) • Explanation: SafeToStack(Obj1,Obj2) Lighter(Obj1,Obj2) Weight(Obj1,Obj2) Weight(Obj2,5) Density(Obj1,0.3) Type(Obj2,Table) Volume(Obj1,2) Equal(0.6,2*0.3) LessThan(0.6,5)
EBL algorithm • Explain training example • Analyze/Generalize Explanation • Add Explanation to Learned Rules (Domain Theory) • Use for example REGRESS algorithm (Mitchell, p. 318) for step (2) • In our example most general rule that can be justified by the explanation is: • SafeToStack(x,y) Volume(x,vx) AND Density(x,dx) AND Equal(wx,times(vx,dx)) AND LessThan(wx,5) AND Type(y,Table)
EBL - Remarks • Knowledge Reformulation: EBL just restates what the learner already knows • You don’t really gain new knowledge! • Why do we need it then? In principle, we can compute everything we need using just the domain theory • In practice, however, this might not work. Consider chess: Does knowing all the rules make you a perfect player? • So EBL reformulates existing knowledge into a more operational form which might be much more effective especially under certain constraints
Concept Learning • Inductive • learn general concept definition from specific training examples • search through predefined space of potential hypotheses for target concept • pick the one that best fits training examples
Concept Learning Example • Taken from Tom Mitchell’s book “Machine Learning” • Concept to learn: “Days on which Tom enjoys his favorite water sport” • Training examples D (every row is an instance, every column an attribute):
Concept Learning • X = set of all instances (instance = combination of attributes), D = set of all training examples (D X) • The “target concept” is a function (target function) c(x) that for a given instance x is either 0 or 1 (in our example 0 if EnjoySport = No, 1 if EnjoySport = Yes) • We do not know the target concept, but we can come up with hypotheses. We would like to find a hypothesis h(x) such that h(x) = c(x) at least for all x in D (D = training examples) • Hypotheses representation: • Let us assume that the target concept is expressed as a conjunction of constraints on the instance attributes. Then we can write a hypothesis like this: AirTemp = Cold Humidity = High. Or, in short: <?,Cold,High,?,?,?> where “?” means any value is acceptable for this attribute. We use “” to indicate that no value is acceptable for an attribute.
Concept Learning • Most general hypothesis: • <?,?,?,?,?,?> • Most specific hypothesis: • <, , , , , > • General-to-specific ordering of hypotheses: ≥g (more general than or equal to) defines partial order over hypothesis space H • FIND-S algorithm • finds the most specific hypothesis • For our example : <Sunny,Warm,?,Strong,?,?>
Concept Learning (FIND-S) • FIND-S algorithm • Initialize h to the most specific hypothesis in H • For each positive training instance x • For each atttribute constraint ai in h If the constraint ai is satisfied by x Then do nothing Else replace ai in h by the next more general constraint that is satisfied by x • Output hypothesis h Why can we ignore negative examples?
Concept Learning • FIND-S only computes the most specific hypothesis • Another approach to concept learning: CANDIDATE-ELIMINATION • CANDIDATE-ELIMINATION finds all hypotheses in the the version space • The version space, denoted VSH,D, with respect to hypothesis space H and training examples D, is a subset of hypotheses from H consistent with the training examples in D. VSH,D = {h H|Consistent(h,D)}
Concept Learning (Version space) • Version space for our example {<Sunny,Warm,?,Strong,?,?>} {<Sunny,?,?,Strong,?,?>} {<Sunny,Warm,?,?,?,?>} {<?,Warm,?,Strong,?,?>} {<Sunny,?,?,?,?,?>}, {<?,Warm,?,?,?,?>}
Concept Learning:Candiate-Elimination algorithm Initialize G to the set of maximally general hypotheses in H Initialize S to set of maximally specific hypotheses in H For each training example d, do • If d is a positive example • Remove from G any hypothesis inconsistent with d • For each hypothesis in s in S that is not consistent with d • Remove s from S • Add to S all minimal generalizations h of s such that • h is consistent with d, and some member of G is more general than h • Remove from S any hypothesis that is more general than another hypothesis in S
Concept Learning:Candiate-Elimination algorithm • If d is a negative example • Remove from S any hypothesis inconsistent with d • For each hypothesis in g in G that is not consistent with d • Remove g from G • Add to G all minimal specializations h of g such that • h is consistent with d, and some member of S is more specific than h • Remove from G any hypothesis that is less general than another hypothesis in G • How to use version space for classification of new instances? • Both algorithms can’t handle noisy training data, i.e. they assume none of the training examples is incorrect • For more complex Concept Learning algorithms see Mitchell – Machine Learning, Chapter 10.
Inductive bias • For both algorithms, we assumed that the target concept was contained in the hypothesis space • Our hypothesis space was the set of all hypotheses than can be expressed as a conjunction of attributes • Such an assumption is called an inductive bias • What if target concept not a conjunction of constraints? Why not consider all possible hypotheses?
Decision Tree Learning • Another inductive, supervised learning method for approximating discrete-valued target functions • Learned function represented by a tree • Leaf nodes provide classification • Each node specifies a test of some attribute of the instance
Decision Tree Learning -Example • Decision tree for the concept “PlayTennis”
Decision Tree Learning • Classification starts at the root node • Example tree corresponds to the expression: • (Outlook = Sunny AND Humidity = Normal)OR (Outlook = Overcast)OR (Outlook = Rain AND Wind = Weak) • Using the tree to classify new instances is easy, but how do we construct a decision tree from training examples?
Decision Tree Learning: ID3 Algorithm • Constructs tree top-down • Order of attributes? • Evaluate each attribute using a statistical test (see next slide) to determine how well it alone classifies the training examples • Use the attribute that best classifies the training example attribute at the root node. Create descendants for each possible value of that attribute. • Repeat process for each descendant…
Decision Tree Learning: Entropy/Gain • Given a collection S, containing positive and negative examples of some target concept, the entropy of S is: • Entropy(S) = -p+ log2p+ – p- log2p- • Information gain is then defined as: • ID3 Algorithm picks attribute with highest gain
Decision Trees - Remarks • Tree represents hypothesis; thus, ID3 determines only a single hypothesis, unlike Candidate-Elimination • Unlike both Concept Learning approaches, ID3 makes no assumptions regarding the hypothesis space; every possible hypothesis can be represented by some tree • Inductive Bias: Shorter trees are preferred over larger trees
Decision Tree - Example Gain(S, Outlook) = 0.246 Gain(S, Humidity) = 0.151 Gain(S, Wind) = 0.048 Gain(S, Temperature) = 0.029 So “Outlook” is the first attribute of the tree: Ssunny = {D1,D2,D8,D9,D11} Gain{Ssunny,Humidity} = 0.97 – (3/5)*0 – (2/5)*0 = 0.97 Gain{Ssunny,Temperature} = 0.97 – (2/5)*0 – (2/5)*1 – (1/5)*0 = 0.57 Gain{Ssunny,Wind} = 0.97 – (2/5)*1 – (3/5)*0.918 = 0.019 Humidity should be tested next Outlook Sunny Rain Overcast ? ? Yes What next?
Learning algorithms in robotics • EBL has been used for Planning Algorithms (example: PRODIGY system (Carbonell et al., 1990)) • Could use EBL, concept learning, decision tree learning for things like object recognition (CL and DTL can easily be extended to more than 2 categories) • However, while symbolic learning algorithms are simple and easy to understand, they are not very flexible and powerful • In many applications, robots need to learn to classify something they perceive (using cameras, sensors etc.) • Sensor inputs etc. are normally numeric • Non-symbolic approaches such as NN, Reinforcement Learning or HMM are better suited and thus more commonly used in practice • Combined approaches exist, for example EBNN (Explanation-based Neural Networks) (See for example paper “Explanation Based Learning for Mobile Robot Perception” by J. O’Sullivan, T. Mitchell and S. Thrun)
Unsupervised Learning • Training data only consists of feature vectors, does not include classifications • Unsupervised Learning algorithms try to find patterns in the data • Classic examples: • Clustering • Fitting Gaussian Density functions to data • Dimensionality reduction
Clustering: k-means algorithm • Cluster objects based on attributes into k partitions • Tries to minimize intra-cluster variance: • Algorithm: • Algorithm starts by partitioning input points into k initial sets, either at random or using some heuristic data. • Then it calculates the mean point, or centroid, of each set. • Constructs new partition by associating each point with the closest centroid. • Recalculates centroids for new clusters • Repeats this until convergence, which is when points no longer switch clusters.
Summary • This presentation mainly covered symbol-based learning algorithms • Deductive • EBL • Inductive • Concept Learning • Decision-Tree Learning • Unsupervised • Clustering (not necessarily symbol-based), COBWEB • Non symbol-based learning algorithms such as neural networks part of the next lecture?
References (Books) • Tom Mitchell: Machine Learning. McGraw-Hill, 1997. • Stuart Russell, Peter Norvig: Artificial Intelligence – a modern approach. Prentice Hall, 2003. • George Luger: Artificial Intelligence. Addison-Wesley, 2002. • Elaine Rich: Artificial Intelligence. McGraw-Hill, 1983.