Computacion Inteligente

Computacion Inteligente Least-Square Methods for System Identification

Contents • System Identification: an Introduction • Least-Squares Estimators • Statistical Properties of least-squares estimators • Maximum likelihood (ML) estimator • Maximum likelihood estimator for linear model • LSE for Nonlinear Models • Developing Dinamic models from Data • Example: Tank level modeling

System Identification: Introduction • Goal • Determine a mathematical model for an unknown system (or target system) by observing its input-output data pairs

System Identification: Introduction • Purposes • To predict a system’s behavior, • As in time series prediction & weather forecasting • To explain the interactions & relationships between inputs & outputs of a system

System Identification: Introduction • Context example • To design a controller based on the model of a system, • as an aircraft or ship control • Simulate the system under control once the model is known

Why cover System Identification • System Identification • It is a well established and easy to use technique for modeling a real life system. • It will be needed for the section on fuzzy-neural networks.

Spring Example Experimental data What will the length be when the force is 5.0 newtons?

Components of System Identification • There are 2 main steps that are involved • Structure identification • Parameter identification

Structure identification • Structure identification • Apply a-priori knowledge about the target system to determine a class of models within which the search for the most suitable model is to be conducted This class of model is denoted by a function y = f(u,) where: • y is the model output • u is the input vector •  is the parameter vector

Structure identification • Structure identification • f(u,)depends on • the problem at hand • the designer’s experience • the laws of nature governing the target system

Parameter identification • Training data is used for both system and model. • Difference between Target System output, yi, and Mathematical Model output, yi, is used to update parameter vector, θ. ^

Parameter identification • Parameter identification • The structure of the model is known, however we need to apply optimization techniques • In order to determine the parameter vector such that the resulting model describes the system appropriately:

System Identification Process • The data set composed of m desired input-output pairs • (ui, yi) (i = 1,…,m) is called the training data • System identification needs to do both structure &parameter identification repeatedly until satisfactory model is found

System Identification: Steps • Specify & parameterizea class of mathematical models representing the system to be identified • Perform parameter identification to choose the parameters that best fit the training data set • Conduct validation set to see if the model identified responds correctly to an unseen data set • Terminate the procedure once the results of the validation test are satisfactory. Otherwise, another class of model is selected & repeat step 2 to 4

System Identification Process Structure and parameter identification may need to be done repeatedly

Least-Squares Estimators

estimate Objective of Linear Least Squares fitting • Given a training data set {(ui, yi), i = 1, …, m} and the general form function: • Find the parameters 1, …, n , such that

The linear model • The linear model: y = 1 f 1(u) + 2 f2(u) + … + nfn(u) = fT(u, ) where: • u = (u1, …, up)T is the model input vector • f1, …, fn are known functions of u • 1, …, n are unknown parameters to be estimated

Least-Squares Estimators • The task of fitting data using a linear model is referred to as linear regression where: • u = (u1, …, up)T is the input vector • f1(u), …, fn(u) regressors • 1, …, n parameter vector

Least-Squares Estimators • We collect training data set {(ui, yi), i = 1, …, m} System’s equations becomes: Which is equivalent to: A = y

Least-Squares Estimators • Which is equivalent to: A = y • where m*n matrix n*1 vector m*1 vector unknown A = y   = A-1y (solution)

Least-Squares Estimators • We have • m outputs, and • n fitting parameters to find • Or • m equations, and • n unknown variables Usually m is greater than n

Least-Squares Estimators • Since • the model is just an approximation of the target system & • the data observed might be corrupted, • Therefore • an exact solution is not always possible! • To overcome this inherent conceptual problem, an error vector e is added to compensate A + e = y

estimate Least-Squares Estimators • Our goal consists now of finding that reduces the errors between and • The problem: Find,

Least-Squares Estimators • If e = y - A then: We need to compute:

Least-Squares Estimators • Theorem [least-squares estimator] The squared error is minimized when  satisfies the normal equation if is nonsingular, is unique & is given by is called the least-squares estimators, LSE

Spring Example • Structure Identification can be done using domain knowledge. • The change in length of a spring is proportional to the force applied. • Hooke’s law length = k0 + k1*force

Spring Example

Statistical Properties of least-squares estimators

Statistical qualities of LSE • Definition [unbiased estimator] An estimator of the parameter  is unbiased if where E[.] is the statistical expectation

Statistical qualities of LSE • Definition [minimal variance] • An estimator is a minimum variance estimator if for any other estimator *: where Cov() is the covariance matrix of the random vector 

Statistical qualities of LSE • Theorem [Gauss-Markov]: • Gauss-Markov conditions: • The error vector e is a vector of muncorrelated random variables, each with zero mean & the same variance2. • This means that:

Statistical qualities of LSE • Theorem [Gauss-Markov] LSE is unbiased & has minimum variance. Proof:

Maximum likelihood (ML) estimator

Maximum likelihood (ML) estimator • The problem • Suppose we observe m independent samples x1, x2, …, xm, • coming from a probability density function with parameters 1, …, r

Maximum likelihood (ML) estimator • The criterion for choosing  is: • Choose parameters  that maximize data probability Which one do you prefer? Why?

Maximum likelihood (ML) estimator • Likelihood function definition: • For a sample of n observations x1, x2, …, xm • with independent probability density function f, • the likelihood function L is defined by L isthe joint probability density

Maximum likelihood (ML) estimator • ML estimator is defined as the value of  which maximizes L: or equivalently:

Maximum likelihood (ML) estimator • Example: ML estimation for normal distribution • Suppose we have m indipendent samples x1, x2, …, xm, coming from a Gaussian distribution with parameters μ and σ2. Which is the MLE for μ and σ2?

Maximum likelihood (ML) estimator • Example: ML estimation for normal distribution • For m observations x1, x2, …, xm, we have:

Maximum likelihood estimator for linear model

Maximum likelihood estimator for linear model • Let a linear model be given as • Then • here e has PDF pe(u,θ) (independent). The likelihood function is given by

Maximum likelihood estimator for linear model • Asume a regression model where errors are distributed normally with zero mean. • The likelihood function is given by

Maximum likelihood estimator for linear model • The maximum likelihood model • Any algorithm that maximizes • gives de Maximum likelihood model with respect to a given family of possible models

Maximum likelihood estimator for linear model • Same as maximizing • Same as minimizing

Connection to Least Squares • Conclusion • The least-squares fitting criterion can be understood as emerging from the use of the maximum likelihood principle for estimating a regression model where errors are distributed normally. • The applicability of the least-squares method is, however, not limited to the normality assumption.

LSE for Nonlinear Models

LSE for Nonlinear Models • Nonlinear models are divided into 2 families • Intrinsically linear • Intrinsically nonlinear • Through appropriate transformations of the input-output variables & fitting parameters, an intrinsically linear model can become a linear model • By this transformation into linear models, LSE can be used to optimize the unknown parameters

LSE for Nonlinear Models • Examples of intrinsically linear systems

Computacion Inteligente