960 likes | 974 Views
Learn about linear regression and least squares methods in system identification, including numerical solutions, computing gradients, and subspace methods, with examples and exercises.
E N D
SYSTEMSIdentification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart Ljung
Lecture 10 Computing the estimate Topics to be covered include: • Linear Regression and Least Squares. • Numerical Solution by Iterative Search Method. • Computing Gradients. • Two-Stage and Multistage Method. • Local Solutions and Initial Values. • Subspace Methods for Estimating State Space Models.
Introduction 1- The Prediction-Error Approach in which a certain function VN(θ,ZN) is minimized with respect to θ. In pervious chapters three basic parameter estimation method considered 2- The Correlation Approach in which a certain equation fN(θ,ZN)=0 is solved for θ. 3- The Subspace Approach to estimating state space models. In pervious chapters we study • Convergence • Asymptotic Distribution of Parameter Estimators In this chapter we shall discuss how these problems are best solved numerically.
Linear Regression and Least Squares. Topics to be covered include: • Linear Regression and Least Squares. • Numerical Solution by Iterative Search Method. • Computing Gradients. • Two-Stage and Multistage Method. • Local Solutions and Initial Values. • Subspace Methods for Estimating State Space Models.
Linear Regression and Least Squares. For linear regression we have: Least-squares criterion leads to An alternative form is: Normal equations Remember that the basic equation for IV method is quite analogous so most of what is said in this section about LS method also applied to IV method.
Linear Regression and Least Squares. Normal equations R(N) may be ill-conditioned specially when its dimension is high. The underlying idea in these methods is that the matrix R(N) should not be formed, instead a matrix R is constructed with the property This class of methods is commonly known as “square-root algorithm” But the term “quadratic methods” is more appropriate. How to derive R? • Householder • Gram-Schmidt procedure • Bjorck and Cholesky decomposition • QR decomposition
The QR-factorization of an n d matrix A is defined as: Here Q is an unitary n n and R is n d. Linear Regression and Least Squares. Solving for the LS estimates by QR factorization.
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization.
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. Let define Let Q as an unitary matrix, then
Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. Now, introduce QR-factorization This means that which clearly is minimized for
1- The condition number of R1 is the square root of R(N). Therefore R1 is much better conditioned than R(N). Linear Regression and Least Squares. Solving for the LS estimates by QR factorization. There are three important advantages with this way of solving the LS estimate: 2- R1 is a triangular matrix, so the equation is easy to solve. 3- If theQR-factorization is performed for a regressor size d*, then the solutionsfor all models with fewer parameter are easily obtained from R0. Remark1: If one find a regressor size d*, then the solutions for models with more parameters are easily obtained from Levinson Algorithm. Remark2: Note that the big matrix Q is never required to find. All the information are contained in the “small” matrix R0
Linear Regression and Least Squares. Levinson Algorithm Remark1: If one find a regressor size d*, then the solutions for models with more parameters are easily obtained from Levinson Algorithm. …… …… …… …… …… …… …… ……
Exercise1 : Suppose for t=1 to 11 the value of u and y are: Consider the simple model for system 1) Derive from eq. (I) and find the condition number of R(N) 2) Derive from eq. (II) and find the condition number of R1 Linear Regression and Least Squares.
Linear Regression and Least Squares. Initial condition: “Windowed” Data The regression vector φ(t) is: Here z(t-1) is an r-dimensional vector. For example, the for ARX model For example, the for AR model R(N) will be:
Linear Regression and Least Squares. Initial condition: “Windowed” Data R(N) will be: If we have knowledge only of z(t) for 1 ≤ t ≤ N the question arises of how to deal with the unknown initial condition 1 - Start the summation at t=n+1 rather than t=1. 2 - Replace the unknown initial condition by zeros.
Numerical Solution by Iterative Search Method Topics to be covered include: • Linear Regression and Least Squares. • Numerical Solution by Iterative Search Method. • Computing Gradients. • Two-Stage and Multistage Method. • Local Solutions and Initial Values. • Subspace Methods for Estimating State Space Models.
In general neither the function nor cannot be minimized or solved by analytical methods. f(i) is a search direction based on information about V(θ) α is a positive constant Depending on the information to determine f (i) there is 3 groups Numerical Solution by Iterative Search Method Numerical minimization Methods for numerical minimization of a function V(θ) update the minimizing point iteratively by: 1- Methods using function values only. 2- Methods using values of the function as well as of its gradient. 3- Methods using values of the function, its gradient and of its Hessian.
Depending on the information to determine f (i) there is 3 groups Numerical Solution by Iterative Search Method • Methods using values of the function, its gradient and of its Hessian.. Newton algorithms • Methods using values of the function V as well as of its gradient. An estimate of Hessian is find and then: Quasi Newton algorithms • Methods using function values only. An estimate of gradient is used then Quasi Newton algorithm applied.
Numerical Solution by Iterative Search Method In general consider the function The gradient is: Here, Ψ(t,θ) is:
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case The gradient is: A general family of search routines is given by
Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case
Let then we have Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case This is the gradient or steepest-descent method. This method is fairly inefficient close to the minimum.
Its equation is: • This intercept is given by the formula: Numerical Solution by Iterative Search Method Gradient or steepest-descent method for solvingf(x)=0. • Make an initial guess: x0. • Draw the tangent line. x1 x2 x0 • Let x1 be x-intercept of the tangent line. • Now repeat x1 as the initial guess. This method is fairly inefficient close to the minimum.
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for solvingf(x)=0. Some difficulties of steepest-descent method. • Zero derivatives. • Diverging. x2 x2 x1 x0
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for finding minimum of f(x)
Numerical Solution by Iterative Search Method Gradient or steepest-descent method for finding minimum of f(x)
Let then we have But it is not an easy task to compute Hessian since of . Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case Thegradient or steepest-descent method is fairly inefficient close to the minimum. Thegradient and the Hessian of V is: This is the Newton method.
But it is not an easy task to compute Hessian since of . Numerical Solution by Iterative Search Method Some explicit search schemes Consider the special case This is the Newton method. Suppose that there is a value θ0 s.t. ε(t, θ0) = e0(t) are independent so
So choose of in the vicinity of minimum is a good estimate of Hessian. Numerical Solution by Iterative Search Method Newton method This is known as the Gauss-Newton Method. In the statistical literature it is called the “Method of scoring”. In the control literature the terms “modified Newton-Raphson” and “quasi linearization” have also been used.
Numerical Solution by Iterative Search Method Newton method Dennis and Schnabel reserve the term “Guess-Newton” for and for the term “damped Guess-Newton” has beenused.
Numerical Solution by Iterative Search Method Newton method Even though RN is assured to be positive semi definite, it may be singular or close to singular. (for example, if the model is over-parameterized or the data are not informative enough) Various ways to overcome this problem exist and are known as“regularization techniques” Goldfeld, Quandt and Trotter suggest Levenberg and Marquardt suggest With λ = 0 we have the Guess-Newton case, increasing λmeans that the step size is decreased and the search direction is turned towards the gradient.
Numerical Solution by Iterative Search Method Remember that we want to or Newton method to solve (I) This leads to Correlation Equation Solving equation (II) is quite analogous to the minimization of (I) Newton-Raphson method to solve (II) Substitution method to solve (II)
Computing Gradients Topics to be covered include: • Linear Regression and Least Squares. • Numerical Solution by Iterative Search Method. • Computing Gradients. • Two-Stage and Multistage Method. • Local Solutions and Initial Values. • Subspace Methods for Estimating State Space Models.
Example 10.1 Consider the ARMAX model the predictor is: Differentiation with respect to ak is: similarly Computing Gradients The amount of work required to compute ψ(t,θ) highly dependent on model structure, and sometimes one may have to resort to numerical differentiation. now
General model structure and its predictor is: Computing Gradients SISO black box model so we have
General model structure and its predictor is: As an special case consider OE model now Computing Gradients SISO black box model
now Computing Gradients SISO black box model As an special case consider OE model
Two-Stage and Multistage Method Topics to be covered include: • Linear Regression and Least Squares. • Numerical Solution by Iterative Search Method. • Computing Gradients. • Two-Stage and Multistage Method. • Local Solutions and Initial Values. • Subspace Methods for Estimating State Space Models.
Combined Two-Stage and Multistage Method Numerical Solution by Iterative Search Method • Guaranteed convergence to a local minimum. • Efficiently. • Applicability to general model structure. Linear Regression and Least Squares • Efficient methods with analytic solution. Two or several LS (IV) stages applied to different substructures.
Some important Two-Stage or Multistage Method Two-Stage and Multistage Method Why we interest in this topic: • It helps to understand the identification literature. • It is useful to providing initial estimates to use in iterative methods . 1- Bootstrap Methods. 2- Bilinear Parameterization. 3- Separate Least Squares. 4- High Order AR(X) Models. 5- Separating Dynamics And Noise Models. 6- Determining ARMA Models. 7- Subspace Methods For Estimating State Space Models.
Two-Stage and Multistage Method Bootstrap Methods Consider the correlation formulation This formulation contains a number of common situation • IV (Instrument variable) methods with: • PLR (Pseudo linear regression) methods: • Minimizing the quadratic criterion:
Two-Stage and Multistage Method Bootstrap Methods Consider the correlation formulation With a at hand it is natural to determine the next step by: It is linear so: It is called Bootstrap Method since it alternate between: It does not necessarily converge to a solution. A convergence analysis is given by: Stoica and Soderstrom (1981b), and Stoica et.al. (1985)
Bilinear means that is linear in ρ for fixed ηand linearin ηfor fixed ρ. Two-Stage and Multistage Method Bilinear Parameterization. For some models, the predictor is bilinear in the parameters, for example consider ARARX model Now the estimator is Let
Two-Stage and Multistage Method Bilinear Parameterization. In ARARX model With this situation, a natural way of minimizing would be to treat it as a sequence of LS problems. Let Exercise2: Exercise 10T.3 Show that this minimization problem is an special case of According to exercise 10T.3 Bilinear parameterization is thus indeed a descent method. It converges to a local minimum.
Two-Stage and Multistage Method Separate Least Squares. A more general situation than the bilinear case is when one set of parameters enter linearly and another set nonlinearly in the predictor: The identification criterion then becomes For given η this criterion is an LS criterion and minimized w.r.t. θ by We can thus insert it to VN and define the problem as
where 2- 1- Two-Stage and Multistage Method Separate Least Squares. The identification criterion then becomes The method is called separate least squares since the LS-part has been separated out, and the problem reduced to a minimization problem of lower dimensions. Separate least squares is known to give numerically well-conditioned calculations, but does not necessary give faster convergence than applying a damped Gauss-Newton method to: without utilizing the particular structure.
Two-Stage and Multistage Method High Order AR(X) Models. Suppose the true system is: An order M, ARX structure is used Hannan and Kavalieris and Ljung and Wahlberg show that So high-order ARX model is capable of approximating any linear system arbitrary well.
Two-Stage and Multistage Method High Order AR(X) Models. So high-order ARX model is capable of approximating any linear system arbitrary well. It is of course desirable to reduce this high-order to more tractable versions:
Two-Stage and Multistage Method Separating Dynamics And Noise Models. General model structure is: Use IV method to determine the dynamic part from u to y we can then determine This noise is a measured signal so an ARMA model can be solved as a separate step. How ??