1.13k likes | 1.73k Views
Chapter 6 The Structural Risk Minimization Principle. Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004. Objectives. Structural risk minimization. Two other induction principles. The Scheme of the SRM induction principle.
E N D
Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004
Minimum Description Length and SRM inductive principles • The idea about the Nature of Random Phenomena • Minimum Description Length Principle for the Pattern Recognition Problem • Bounds for the MDL • SRM for the simplest Model and MDL • The Shortcoming of the MDL
The idea about the Nature of Random Phenomena • Probability theory (1930s, Kolmogrov) • Formal inference • Axiomatization hasn’t considered nature of randomness • Axioms: given probability measures
The idea about the Nature of Random Phenomena • The model of randomness • Solomonoff (1965), Kolmogrov (1965), Chaitin (1966). • Algorithm (descriptive) complexity • The length of the shortest binary computer program • Up to an additive constant does not depend on the type of computer. • Universal characteristic of the object.
A relatively large string describing an object is random • If algorithm complexity of an object is high • If the given description of an object cannot be compressed significantly. • MML (Wallace and Boulton, 1968)& MDL (Rissanen, 1978) • Algorithm Complexity as a main tool of induction inference of learning machines
Minimum Description Length Principle for the Pattern Recognition Problem • Given l pairs containing the vector x and the binary value ω • Consider two strings: the binary string
Question • Q: Given (147), is the string (146) a random object? • A: to analyze the complexity of the string (146) in the spirit of Solomonoff-Kolmogorov-Chaitin ideas
Compress its description • Since ωii=1,…l are binary values, the string (146) is described by l bits. • Since training pairs were drawn randomly and independently. • The value ωi depend on the vector xibut not on the vector xj.
Bounds for the MDL • Q: • Does the compression coefficient K(T) determine the probability of the test error in classification (decoding) vectors x by the table T? • A: • Yes
The power of compression coefficient • To obtain bound for the probability of error • Only information about the coefficient need to be known.
The power of compression coefficient • How many examples we used • How the structure of code books was organized • Which code book was used and how many tables were in this code book. • How many errors were made by the table from the code book we used.
MDL principle • To minimize the probability of error • One has to minimize the coefficient of compression
The shortcoming of the MDL • MDL uses code books with a finite number of tables. • Continuously depends on parameters, one has to first quantize that set to make the tables.
Quantization • How do we make the ‘smart’ quantization for a given number of observations. • For a given set of functions, how can we construct a code book with a small number of tables but with good approximation ability?
The shortcoming of the MDL • Finding a good quantization is extremely difficult and determines the main shortcoming of MDL principle. • The MDL principle works well when the problem of constructing reasonable code books has a good solution.
Consistency of the SRM principle and asymptotic bounds on the rate of convergence • Q: • Is the SRM consistent? • What is the bound on the (asymptotic) rate of convergence?