220 likes | 310 Views
Theory Revision. Chris Murphy. The Problem. Sometimes we: Have theories for existing data that do not match new data Do not want to repeat learning every time we update data Believe that our rule learners could perform much better if given basic theories to build off of.
E N D
Theory Revision Chris Murphy
The Problem • Sometimes we: • Have theories for existing data that do not match new data • Do not want to repeat learning every time we update data • Believe that our rule learners could perform much better if given basic theories to build off of
Two Types of Errors in Theories • Over-generalization • Theory covers negative examples • Caused by incorrect rules in theory or by existing rules missing necessary constraints • Example: uncle(A,B) :- brother(A,C). • Solution: uncle(A,B) :- brother(A,C), parent(C,B).
Two Types of Errors in Theories • Over-specialization • Theory does not cover all positive examples • Caused by rules having additional, unnecessary constraints or missing rules in the theory that are necessary to proving some examples • Example: uncle(A,B) :- brother(A,C), mother(C,B). • Solution: Uncle(A,B) :- brother(A,C), parent(C,B).
What is Theory Refinement? • “…learning systems that have a goal of making small changes to an original theory to account for new data.” • Combination of two processes: • Using a background theory to improve rule effectiveness and adequacy on data • Using problem detection and correction processes to make small adjustments to said theories
Basic Issues Addressed • Is there an error in the existing theory? • What part of the theory is incorrect? • What correction needs to be made?
Theory Refinement Basics • System is given a beginning theory about domain • Can be incorrect or incomplete (and often is) • Well refined theory will: • Be accurate with new/updated data • Make as few changes as possible to original theory • Changes are monitored by a “Distance Metric” that keeps a count of every change made
The Distance Metric • Adds every addition, deletion, or replacement of clauses • Used to: • Measure syntactical corruptness of original theory • Determine how good a learning system is at replicating human created theories • Drawback is that it does not recognize equivalent literals such as less(X,Y). And greq(Y,X). • Table on the right shows examples of distance between theories, as well as its relationship to accuracy
Why Preserve the Original Theory? • If you understood the original theory, you’ll likely understand the new one • Similar theories will likely retain the ability to use abstract predicates from the original theory
Theory Refinement Systems • EITHER • FORTE • AUDREY II • KBANN • FOCL, KR-FOCL, A-EBL, AUDREY, and more
EITHER • Explanation-based and Inductive Theory Extension and Revision • First system with ability to fix over-generalizing and over-specialization • Able to correct multiple faults • Uses one or more failings at a time to learn one or more corrections to a theory • Able to correct intermediate points in theories • Uses positive and negative examples • Able to learn disjunctive rules • Specialization algorithm does not allow positives to be eliminated • Generalization algorithm does not allow negatives to be admitted
FORTE • Attempts to prove all positive and negative examples using the current theory • When errors are detected: • Identify all clauses that are candidates for revision • Determine whether clause needs to be specialized or generalized • Determine what operators to test for various revisions • Best revision is determined based on its accuracy when tested on complete training set • Process repeats until system perfectly classifies the training set or until FORTE finds that no revisions improve the accuracy of the theory
Specializing a Theory • Needs to happen when one or more negatives are covered • Ways to fix the problem: • Delete a clause: simple, just delete and retest • Add new antecedents to existing clause • More difficult • FORTE uses two methods... • Add one antecedent at a time, like FOIL, choosing the antecedent that provides the best info gain at any point • Relational Pathfinding – uses graph structures to find new relations in data
Generalizing a Theory • Need to generalize when positives are not covered • Ways FORTE generalizes: • Delete antecedents from an existing clause (either singly or in groups) • Add a new clause • Copy clause identified at the revision point • Purposely over-generalize • Send over-general rule to specialization algorithm • Use inverse relation operators “identification” and “absorption” • These use intermediate rules to provide more options for alternative definitions
AUDREY II • Runs in two main phases: • Initial domain theory is specialized to eliminate negative coverage • At each step, a best clause is chosen, it is specialized, and the process repeats • Best clause is the one that contributes the most negative examples being incorrectly classified and is required by the fewest number of positives • If best clause covers no positives, it is deleted, otherwise, literals are added in a FOIL-like manner to eliminate covered negatives
AUDREY II • Revised theory is generalized to cover all positives (without covering any negatives) • Uncovered positive example is randomly chosen, and theory is generalized to cover the example • Process repeats until all remaining positives are covered • If assumed literals can be removed without decreasing positive coverage, that is done • If not, AUDREY II tries replacing literals with new conjuction of literals (also uses FOIL-type process) • If deleting and replacement fail, system uses a FOIL-like method of determining entirely new clauses for proving the literal
KBANN • System that takes a domain theory of Prolog style clauses, and transforms it into knowledge-based neural network (KNN) • Uses the knowledge base (background theory) to determine topology and initial weights of KNN • Different units and links within KNN correspond to various components of the domain theory • Topologies of KNNs can be different than topologies that we have seen in neural networks
KBANN • KNNs are trained on example data, and rules are extracted using an N of M method (saves time) • Domain theories for KBANN need not contain all intermediate theories necessary to learn certain concepts • Adding hidden units along with units specified by the domain theory allows the network to induce necessary terms not stated in background info • Problems arise when interpreting intermediate rules learned from hidden nodes • Difficult to label them based on the inputs they resulted from • In one case, programmers labeled rules based on the section of info that they were attached to in that topology
System Comparison • AUDREY II is better than FOCL at theory revision, but it still has room for improvement • Its revised theories are closer to both original theory and human-created correct theory
System Comparison • AUDREY II is slightly more accurate than FORTE, and its revised theories are closer to the original and correct theories • KR-FOCL addresses some issues of other systems by allowing user to decide among changes that have the same accuracy
Applications of Theory Refinement • Used to identify different parts of both DNA and RNA sequences • Used to debug student written basic Prolog programs • Used to maintain working theories as new data is obtained