200 likes | 385 Views
Data Modeling and Least Squares Fitting 2. COS 323. Nonlinear Least Squares. Some problems can be rewritten to linear Fit data points (x i , log y i ) to a * +bx, a = e a* Big problem: this no longer minimizes squared error!. Nonlinear Least Squares.
E N D
Nonlinear Least Squares • Some problems can be rewritten to linear • Fit data points (xi, log yi) to a*+bx, a = ea* • Big problem: this no longer minimizessquared error!
Nonlinear Least Squares • Can write error function, minimize directly • For the exponential, no analytic solution for a, b:
Newton’s Method • Apply Newton’s method for minimization:where H is Hessian (matrix of all 2nd derivatives) and G is gradient (vector of all 1st derivatives)
Newton’s Method for Least Squares • Gradient has 1st derivatives of f, Hessian 2nd
Gauss-Newton Iteration • Consider 1 term of Hessian: • If close to answer, first term close to 0 • Gauss-Newtion method: ignore first term! • Eliminates requirement to calculate 2nd derivatives of f • Surprising fact: still superlinear convergence if“close enough” to answer
Levenberg-Marquardt • Newton (and Gauss-Newton) work well when close to answer, terribly when far away • Steepest descent safe when far away • Levenberg-Marquardt idea: let’s do both Steepestdescent Gauss-Newton
Levenberg-Marquardt • Trade off between constants depending on how far away you are… • Clever way of doing this: • If is small, mostly like Gauss-Newton • If is big, matrix becomes mostly diagonal,behaves like steepest descent
Levenberg-Marquardt • Final bit of cleverness: adjust depending on how well we’re doing • Start with some , e.g. 0.001 • If last iteration decreased error, accept the step and decrease to /10 • If last iteration increased error, reject the step and increase to 10 • Result: fairly stable algorithm, not too painful (no 2nd derivatives), used a lot
Outliers • A lot of derivations assume Gaussian distribution for errors • Unfortunately, nature (and experimenters)sometimes don’t cooperate • Outliers: points with extremely low probability of occurrence (according to Gaussian statistics) • Can have strong influence on least squares probability Gaussian Non-Gaussian
Robust Estimation • Goal: develop parameter estimation methods insensitive to small numbers of large errors • General approach: try to give large deviations less weight • M-estimators: minimize some function other than square of y – f(x,a,b,…)
Least Absolute Value Fitting • Minimizeinstead of • Points far away from trend get comparativelyless influence
Example: Constant • For constant function y = a,minimizing (y–a)2 gave a = mean • Minimizing |y–a| gives a = median
Doing Robust Fitting • In general case, nasty function:discontinuous derivative • Simplex method often a good choice
Iteratively Reweighted Least Squares • Sometimes-used approximation:convert to iterated weighted least squareswith wi based on previous iteration
Iteratively Reweighted Least Squares • Different options for weights • Avoid problems with infinities • Give even less weight to outliers
Iteratively Reweighted Least Squares • Danger! This is not guaranteed to convergeto the right answer! • Needs good starting point, which is available ifinitial least squares estimator is reasonable • In general, works OK if few outliers, not too far off
Outlier Detection and Rejection • Special case of IRWLS: set weight = 0 if outlier, 1 otherwise • Detecting outliers: (yi–f(xi))2 > threshold • One choice: multiple of mean squared difference • Better choice: multiple of median squared difference • Can iterate… • As before, not guaranteed to do anything reasonable, tends to work OK if only a few outliers
RANSAC • RANdom SAmple Consensus: desgined forbad data (in best case, up to 50% outliers) • Take many random subsets of data • Compute least squares fit for each sample • See how many points agree: (yi–f(xi))2 < threshold • Threshold user-specified or estimated from more trials • At end, use fit that agreed with most points • Can do one final least squares with all inliers