220 likes | 456 Views
Graph-Based Concept Learning. Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas at Arlington Box 19015, Arlington, TX 76019-0015 {gonzalez,holder,cook}@cse.uta.edu http://cygnus.uta.edu/subdue/. MOTIVATION AND GOAL.
E N D
Graph-Based Concept Learning Jesus A. Gonzalez, Lawrence B. Holder, and Diane J. Cook Department of Computer Science and Engineering University of Texas at Arlington Box 19015, Arlington, TX 76019-0015 {gonzalez,holder,cook}@cse.uta.edu http://cygnus.uta.edu/subdue/
MOTIVATION AND GOAL • Need for non-logic-based relational concept learner • Empirical and theoretical comparisons of relational learners • Logic-based relational learners (ILP) • FOIL [Quinlan et al.] • Progol [Muggleton et al.] • Graph-based relational learner • SUBDUE
SUBDUE KNOWLEDGE DISCOVERY SYSTEM • SUBDUE discovers patterns (substructures) in structural data sets • SUBDUE represents data as a labeled graph. • Vertices represent objects or attributes • Edges represent relationships between objects • Input: Labeled graph • Output: Discovered patterns and instances
SUBDUE EXAMPLE Input Output shape triangle object shape square on object 4 instances of
SUBDUE’S SEARCH • Starts with a single vertex and repeatedly expands by one edge • Computationally-constrained beam search • Polynomially-constrained inexact graph matching • Search space is all sub-graphs of input graph • Guided by compression heuristic • Minimum description length
EVALUATION CRITERION MINIMUM DESCRIPTION LENGTH • Minimum Description Length (MDL) principle • The best theory to describe a set of data is the one that minimizes the DL of the entire data set. • DL of the graph: the number of bits necessary to completely describe the graph. • Search for the substructure that results in the maximum compression.
CONCEPT LEARNING SUBDUE • Modify Subdue for concept learning (SubdueCL) • Accept positive and negative graphs as input examples • Find substructure describing positive examples, but not negative examples • Learn multiple rules (DNF)
CONCEPT LEARNING SUBDUE • Evaluation criteria based on number of positive examples covered without covering negative examples • Substructure value = 1 - Error
WK = White King WR = White Rook BK = Black King lt = less than adj = adjacent pos = position eq = equal CONCEPT LEARNING SUBDUE EXAMPLE • Examples in graph format (chess domain): a) Board Configuration b) Graph Representation
SUBDUE: error = 0.051163 +/- 0.044935 FOIL: error = 0.069767 +/- 0.054814 PROGOL: error = 0.230233 +/- 0.066187 SUBDUE: error = 0.004600 +/- 0.006186 SUBDUE – FOIL = -0.018605 +/- 0.052347 (p=0.145068) FOIL: error = 0.006600 +/- 0.007183 SUBDUE - PROGOL = -0.179070 +/- 0.074394 (p=0.000016) PROGOL: error = 0.002600 +/- 0.002675 FOIL - PROGOL = -0.16046 5 +/- 0.067979 (p=0.000019) SUBDUE - FOIL = -0.002000 +/- 0.007542 (p=0.211723) ANOVA: 0.000000 SUBDUE - PROGOL = 0.002000 +/- 0.004989 (p=0.118354) FOIL - PROGOL = 0.004000 + /- 0.007242 (p=0.057322) ANOVA: 0.306232 PRELIMINARY RESULTS • Comparison with FOIL and Progol • Significance test p for the Vote domain • Significance test p for the Chess domain
RELATED THEORY • Galois lattice [reference?] • Subdue’s search space is similar to the Galois lattice • Polynomial convergence results for the Galois lattice apply to Subdue • PAC analysis of conceptual graphs [reference?] • Subdue’s representation is a superset of conceptual graphs • PAC sample complexity results for conceptual graphs apply to Subdue
CONCLUSIONS • Empirical results indicate Subdue is competitive with ILP systems • More empirical comparisons are necessary • Theoretical results on Galois lattice and conceptual graphs apply to Subdue • Need to identify specific components of the theory directly applicable to Subdue • Expand theories where needed