330 likes | 705 Views
Model Assessment & Selection. Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University. Outline. Bias, Variance and Model Complexity The Bias-Variance Decomposition Optimism of the Training Error Rate Estimates of In-Sample Prediction Error
E N D
Model Assessment & Selection Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University
Outline • Bias, Variance and Model Complexity • The Bias-Variance Decomposition • Optimism of the Training Error Rate • Estimates of In-Sample Prediction Error • The Effective Number of Parameters • The Bayesian Approach and BIC • Minimum Description Length • Vapnik-Chernovenkis Dimension • Cross-Validation • Bootstrap Methods Model Assessment & Selection
Bias, Variance & Model Complexity Model Assessment & Selection
Bias, Variance & Model Complexity • The standard of model assessment : the generalization performance of a learning method • Model: • Prediction Model: • Loss function: Model Assessment & Selection
Bias, Variance & Model Complexity • Error: training error, generalization error • Typical loss function: Model Assessment & Selection
Bias-Variance Decomposition • Basic Model: • The expected prediction error of a regression fit . • The more complex the model, the lower the (squared) bias but the higher the variance. Model Assessment & Selection
Bias-Variance Decomposition • For the k-NN regression fit the prediction error: • For the linear model fit Model Assessment & Selection
Bias-Variance Decomposition • The in-sample error of the Linear Model • The model complexity is directly related to the number of parameters p. • For ridge regression the square bias Model Assessment & Selection
Closest fit in population Realization Closest fit Truth MODEL SPACE Regularized fit Model bias Estimation bias RESTRICED MODEL SPACE Estimation Variance Bias-Variance Decomposition • Schematic of the behavior of bias and variance Model Assessment & Selection
Optimism of the Training Error Rate • Training Error < True Error • is extra-sample error • The in-sample error • Optimism: Model Assessment & Selection
Optimism of the Training Error Rate • For squared error, 0-1, other loss function: • is obtained by a linear fit with d inputs or basis function, a simplification is: • 输入维数或基函数的个数增加,乐观性增大 • 训练样本数增加,乐观性降低 Model Assessment & Selection
Estimates of In-sample Prediction Error • The general form of the in-sample estimates is • parameters are fit under Squared error loss • Use a log-likelihood function to estimate • This relationship introduce the Akaike Information Criterion Model Assessment & Selection
Akaike Information Criterion • Akaike Information Criterion is a similar but more generally applicable estimate of • A set of models with a turning parameter : • provides an estimate of the test error curve, and we find the turning parameter that minimizes it. Model Assessment & Selection
Akaike Information Criterion • For the logistic regression model, using the binomial log-likelihood. • For Gaussian model the AIC statistic equals to the Cp statistic. Model Assessment & Selection
Akaike信息准则 • 音素识别例子: Model Assessment & Selection
Effective number of parameters • A linear fitting method: • Effective number of parameters: • If is an orthogonal projection matrix onto a basis set spanned by features, then: • is the correct quantity to replace in the Cp statistic Model Assessment & Selection
Bayesian Approach & BIC • The Bayesian Information Criterion (BIC) • Gaussian model: • Variance • then • So • is proportional to , 2 replaced by • 倾向选择简单模型, 而惩罚复杂模型 Model Assessment & Selection
BayesianModel Selection • BIC derived from Bayesian Model Selection • Candidate models Mm, model parameter and a prior distribution • Posterior probability: Model Assessment & Selection
BayesianModel Selection • Compare two models • If the odds are greater than 1, model m will be chosen, otherwise choose model • Bayes Factor: • The contribution of the data to the posterior odds Model Assessment & Selection
Bayesian模型选择 • 如果模型的先验是均匀的Pr(M)是常数, 极小BIC的模型等价于极大化后验概率模型 优点:当模型包含真实模型是,当样本趋于无穷时,BIC 选择正确的概率是一。 Model Assessment & Selection
最小描述长度(MDL) • 来源:最优编码 • 信息: z1 z2 z3 z4 • 编码: 0 10 110 111 • 编码2: 110 10 111 0 • 准则:最频繁的使用最短的编码 • 发送信息zi的概率: • 香农定律指出使用长度: Model Assessment & Selection
最小描述长度(MDL) Model Assessment & Selection
模型选择MDL Model Assessment & Selection
模型选择MDL MDL原理:我们应该选择模型,使得下列长度极小 Model Assessment & Selection
Vapnik-Chernovenkis维 • 问题:如何选择模型的参数的个数 d? • 该参数代表了模型的复杂度 • VC维是描述模型复杂度的一个重要的指标 Model Assessment & Selection
VC维 • 类 的VC维定义为可以被 成员分散的点的最大的个数 • 平面的直线类VC维为3。 • sin(ax) 的VC维是无穷大。 Model Assessment & Selection
VC维 • 实值函数类 的VC维定义为指示类 的VC维。 • 引入VC维可以为泛化误差提供一个估计 • 设 的VC维为h,样本数为N. Model Assessment & Selection
交叉验证 Model Assessment & Selection
自助法 • 基本思想:从训练数据中有放回地随机抽样数据集,每个数据集的与原训练集相同。 • 如此产生B组 自助法数据集 • 如何利用这些数据集进行预测? Model Assessment & Selection
自助法 • 自助法过程图解: 重复实验 样本 训练样本 Model Assessment & Selection
Summary • Bias, Variance and Model Complexity • The Bias-Variance Decomposition • Optimism of the Training Error Rate • Estimates of In-Sample Prediction Error • The Effective Number of Parameters • The Bayesian Approach and BIC • Minimum Description Length • Vapnik-Chernovenkis Dimension • Cross-Validation • Bootstrap Methods Model Assessment & Selection