Machine Learning

Machine Learning CIS 479/579 Bruce R. Maxim UM-Dearborn

Machine Learning • Study of processes that lead to self-improvement of machine performance. • Some people argue that this is likely to be the only way a machine will be able to pass the Turing Test

Why study machine learning? • To gain insight in to the nature of human learning processes • To solve the knowledge acquisition bottleneck that has plagued expert systems development • Use as an information sifter to help deal with the information overload problem

Learning • More than having lots of facts available • It implies the ability to use knowledge to create new knowledge or integrating new facts into an existing knowledge structure • Learning typically requires repetition and practice to reduce differences between observed and actual performance, knowledge acquisition does not

Negative Features of Human Learning • Its slow (5-6 years for motor skills 12-20 years for abstract reasoning) • Its inefficient • Its expensive • There is no copy process • There is no visible or representation of learning that can be inspected • Learning strategy is often a function of knowledge available to learner

Feedback Model for Machine Learning

Essential Components • Set of rules or data structures making up the knowledge-base • Changes dynamically during program execution • Task module (performance algorithm) • Uses input and knowledge-base to solve problem at hand • Must produce quantifiable output that can be used to measure system performance

Essential Components • Feedback module or critic • Compares task output to output predicted for idealized system • Feedback from it is passed as an “error” signal to guide learner in revising the system • Learner module • Uses error signal to upgrade and modify the rules • Goal is to reduce size of discrepancy between actual and ideal performance

Difficulties • Minsky’s credit assignment problem affects the critic’s ability to make revisions • The key to the success of any feedback learning system is the power and flexibility of its knowledge representation scheme • The degree to which obsolete or incorrect knowledge can be dynamically removed from the system or forgotten also affects system performance

Types of Learning • Rote learning • Memorization of facts without need to draw inferences • Similar to data retrieval type systems • Learning by instruction • Learner uses some inference strategy to transform knowledge into some internal knowledge representation applicable to problem at hand

Types of Learning • Learning by deduction • From a general rule like C =  * diameter we deduce that a circle with diameter 10 has circumference of 31.4159 • Learning by analogy • New problem is solved using solution to similar known problem • For example, computation of simple interest is similar to computing gross pay

Types of Learning • Learning by induction • Looking at lots of specific cases and formulating a general rule • Subcategories • Learning by example • Learning by experimentation • Learning by observation • Learning by discovery

Approaches to Machine Learning • Numerical approaches • Similar to Samuel’s checkers static evaluation Function • Build numeric model ad fiddle with parameters based on successes • Structural approaches • Concerned with the process of defining relationships by creating links between concepts

Learning in Samuel’s Program • Rote learning • Remembering moves returned from specific board configurations • Allowed search depths to be doubled • Learning by generalization • Adjusting coefficients in static evaluation function through self-play (winner’s version of the function is replaces current model)

Learning in Samuel’s Program • Learning by generalization • Signature table created as means of indexing 250,000 board positions and the best responses • Used 24 dimensional parameter space and computed statistical averages as a way to teach computer the optimal moves

Equilibration • Piaget argues that people learn to reduce anxiety caused by cognitive dissonance • Revising your cognitive model of a concept to remove contradictory observations relieves this anxiety

Equilibration • Quinlan’s ID3 and Michalski’s Induce programs attempt to learn classification rules by induction through the examination of sets of examples and non-examples • Non-examples are “near-misses” (i.e. examples that conform to incomplete rules of a concept but contain minor errors)

Learning by Discovery • Lenat created two programs AM (automated mathematician) and Eurisko (Eureka heuristic) • Lenat believed that heuristic, rule-guided search is an operational model of human intelligence • AM with proper “seeding” discovered much of set theory and arithmetic

AM Algorithm • Select a concept to evaluate and generate examples of it • Check examples for regularities and if found • Update the “interestingness” of the concept • Create new concept • Create new conjectures • Propagate knowledge gained to other concepts known by the system

Eurisko • Lenat added the ability to discover new heuristics as well • It turns out that heuristics can be represented as concepts as well • Eurisko’s accomplishments • created a model of DNA mutation • Invented a 2D integrated circuit (functions as both “nand” and “or” gate) that eluded human designers • Undisputed “Traveler” champion

Induction Learning Heuristics • Used to allow program to learn class descriptions from the presentation of positive and negative examples • Suppose you were trying to get a program to learn the concept of “square”

Induction Learning Heuristics • Require-link • Make link a “must have” link if the evolving model has link and near-miss non-example does not • Forbid-link • Make link a “must not have” link if near-miss has link and evolving model does not • Climb-tree • Follow “isa” links to search super-classes for common ancestor • Used when object in evolving model corresponds to an object in a different example

Induction Learning Heuristics • Enlarge-set • Create new class composed of evolving object and example classes • Used when two objects are not already related in the same tree • Drop-link • Remove inappropriate link in evolving model based on comparison with example • Close-interval • Add numerical value from example to range of values housed in evolving model

Procedure Specialize(makes model more restrictive) • Match evolving model to example and establish correspondence among parts • Determine whether there is a single important difference between evolving model and near-miss • If so and evolving model has link not in near-miss then user require-link • If so and near-miss has link and model does not then use forbid-link • Otherwise ignore example

Procedure Generalize(makes model more permissive) • Match evolving model to example and establish correspondence among parts • For each difference determine its type • if link points to class in model different from class to which link points in example then • if classes are part classification tree then use climb-tree • if classes are from exhaustive set then use drop-link • otherwise use enlarged set

Procedure Generalize(makes model more permissive) • if link is missing in example then use drop-link • if difference is number outside range then use close-interval • otherwise ignore difference • Note: identifying differences is tough may need to use heuristics to focus learning algorithm attention

Learning Algorithm • Let first description of example be the initial description (must not be a non-example) • For all subsequent examples • If example is near-miss then use procedure specialaize • If example is true example then use procedure generalize

Machine Learning