Chapter 6 The Structural Risk Minimization Principle

Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004

Objectives

Structural risk minimization

Two other induction principles

The Scheme of the SRM induction principle

Real-Valued functions

Principle of SRM

SRM

Minimum Description Length and SRM inductive principles • The idea about the Nature of Random Phenomena • Minimum Description Length Principle for the Pattern Recognition Problem • Bounds for the MDL • SRM for the simplest Model and MDL • The Shortcoming of the MDL

The idea about the Nature of Random Phenomena • Probability theory (1930s, Kolmogrov) • Formal inference • Axiomatization hasn’t considered nature of randomness • Axioms: given probability measures

The idea about the Nature of Random Phenomena • The model of randomness • Solomonoff (1965), Kolmogrov (1965), Chaitin (1966). • Algorithm (descriptive) complexity • The length of the shortest binary computer program • Up to an additive constant does not depend on the type of computer. • Universal characteristic of the object.

A relatively large string describing an object is random • If algorithm complexity of an object is high • If the given description of an object cannot be compressed significantly. • MML (Wallace and Boulton, 1968)& MDL (Rissanen, 1978) • Algorithm Complexity as a main tool of induction inference of learning machines

Minimum Description Length Principle for the Pattern Recognition Problem • Given l pairs containing the vector x and the binary value ω • Consider two strings: the binary string

Question • Q: Given (147), is the string (146) a random object? • A: to analyze the complexity of the string (146) in the spirit of Solomonoff-Kolmogorov-Chaitin ideas

Compress its description • Since ωii=1,…l are binary values, the string (146) is described by l bits. • Since training pairs were drawn randomly and independently. • The value ωi depend on the vector xibut not on the vector xj.

Model

General Case: not contain the perfect table.

Randomness

Bounds for the MDL • Q: • Does the compression coefficient K(T) determine the probability of the test error in classification (decoding) vectors x by the table T? • A: • Yes

Comparison between the MDL and ERM in the simplest model

SRM for the simplest Model and MDL

The power of compression coefficient • To obtain bound for the probability of error • Only information about the coefficient need to be known.

The power of compression coefficient • How many examples we used • How the structure of code books was organized • Which code book was used and how many tables were in this code book. • How many errors were made by the table from the code book we used.

MDL principle • To minimize the probability of error • One has to minimize the coefficient of compression

The shortcoming of the MDL • MDL uses code books with a finite number of tables. • Continuously depends on parameters, one has to first quantize that set to make the tables.

Quantization • How do we make the ‘smart’ quantization for a given number of observations. • For a given set of functions, how can we construct a code book with a small number of tables but with good approximation ability?

The shortcoming of the MDL • Finding a good quantization is extremely difficult and determines the main shortcoming of MDL principle. • The MDL principle works well when the problem of constructing reasonable code books has a good solution.

Consistency of the SRM principle and asymptotic bounds on the rate of convergence • Q: • Is the SRM consistent? • What is the bound on the (asymptotic) rate of convergence?

Chapter 6 The Structural Risk Minimization Principle