390 likes | 860 Views
Basis Expansion and Regularization. Presenter: Hongliang Fei Brian Quanz Date: July 03, 2008. Contents. Introduction Piecewise Polynomials and Splines Filtering and Feature Extraction Smoothing Splines Automatic Smoothing parameter selection. 1. Introduction.
E N D
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Date: July 03, 2008
Contents • Introduction • Piecewise Polynomials and Splines • Filtering and Feature Extraction • Smoothing Splines • Automatic Smoothing parameter selection
1. Introduction • Basis: In Linear Algebra, a basis is a set of vectors satisfying: • Linear combination of the basis can represent every vector in a given vector space; • No element of the set can be represented as a linear combination of the others.
In Function Space, Basis is degenerated to a set of basis functions; • Each function in the function space can be represented as a linear combination of the basis functions. • Example: Quadratic Polynomial bases {1,t,t^2}
What is Basis Expansion? • Given data X and transformation Then we model as a linear basis expansion in X, where is a basis function.
Why Basis Expansion? • In regression problems, f(X) will typically nonlinear in X; • Linear model is convenient and easy to interpret; • When sample size is very small but attribute size is very large, Linear model is all what we can do to avoid over fitting.
2. Piecewise Polynomials and Splines • Spline: • In Mathematics, a spline is a special function defined piecewise by polynomials; • In Computer Science, the term spline more frequently refers to a piecewise polynomial (parametric) curve. • Simple construction, ease and accuracy of evaluation, capacity to approximate complex shapes through curve fitting and interactive curve design.
Example of a Spline http://en.wikipedia.org/wiki/Image:BezierInterpolation.gif
Assume four knots spline (two boundary knots and two interior knots), also X is one dimensional. • Piecewise constant basis: • Piecewise Linear Basis:
Basis functions: • Six functions corresponding to a six-dimensional linear space.
An M-order spline with knots has continuous derivatives up to order M-2. The general form for truncated-power basis set would be:
Natural cubic Spline • A natural cubic spline adds additional constrains: function is linear beyond the boundary knots. • A natural cubic spline with K knots is represented by K basis functions. • One can start from a basis for cubic splines, and derive the reduced basis by imposing boundary constraints.
Example of Natural cubic spline • Starting from the truncated power series basis, we arrive at: Where
Data:1000 samples drawn from 695 “aa”s and 1022 “ao”s, with a feature vector of length 256. • Goal: use such data to classify spoken phoneme. • The coefficients can be plotted as a function of frequency
Fitting via maximum likelihood only, the coefficient curve is very rough; • Fitting through natural cubic splines: • Rewrite the coefficient function as expansion of splines that’s where H is a p by M basis matrix of natural cubic splines. • since we replace input features x by filtered version . • Fit via linear logistic regression on • Final result
3. Filtering and Feature Extraction • Preprocessing high-dimensional features is a power method to improve performance of learning algorithm. • Previous example , a filtering approach to transform features; • They need not be linear, but can be in a general form . Another example: wavelet transform refers to section 5.9.
4.Smoothing Splines • Purpose: avoid complexity of knot selection problem by using maximal set of knots. • Complexity is controlled via regularization. • Considering this problem: among all functions with two continuous second derivative, minimize
Though RSS is defined on an infinite-dimensional function space, it has an explicit, finite-dimensional unique minimizer : a natural cubic spline with knots at the unique values of the . • Penalty term translates to a penalty on the spline coefficients.
Rewrite the solution: , where are N-dimensional set of basis functions representing the family of natural splines. • Matrix format criterion: Where . • With ridge regression result, the solution: • The fitted smooth spline is given by
Degree of freedom and smoother matrix • A smoothing spline with prechosen is a linear operator. • Let be the N-vector of fitted values at the training predictors : Here is called smoother matrix. It depends on only.
Suppose is a N by M matrix of M cubic spline basis functions evaluated at the N training points , with knot sequence . The fitted spline value is given by: Here linear operator is a projection operator, known as hat matrix in statistics.
Similarity and difference between and • Both are symmetric, positive, semi-definite. • Idempotent • Rank( )=N, Rank( )=M. • Trace of gives the dimension of the projection space (number of basis functions).
Define effective degree of freedom as: By specifying , we can derive . • Since is symmetric, hence rewrite is the solution of K is known as Penalty Matrix.
Eigen-decomposition of is given by: where are eigen value and eigen vector of K.
Highlights of eigen-decompostion • The eigen-vectors are not effected by changes in . • Shrinking nature . • The eigen-vector sequence ordered by decreasing appears to increase in complexity. • First two eigen values are always 1, since d1=d2=0, showing Linear functions are not penalized.
5. Automatic selection of the smoothing parameters • Selecting the placement and number of knots for regression splines can be a combinatorially complex task; • For smoothing splines, only penalty . • Method: fixing the degree of freedom, solve it from . • Criterion: Bias-Variance tradeoff.
The Bias-Variance Tradeoff • Integrated squared prediction error (EPE): • Cross Validation: