260 likes | 478 Views
ENGN 2226 ENGINEERING SYSTEMS ANALYSIS Time series, non-linear models, Estimation and High Dimensional Models. Estimation and Prediction. Sampling Errors for y. An estimator of variance. Confidence/Prediction Intervals. Time series.
E N D
ENGN 2226 ENGINEERING SYSTEMS ANALYSISTime series, non-linear models, Estimation and High Dimensional Models
Time series A time series is a set of instances of measurements made at regular time intervals from the same system over a period of time. Time series should be plotted as a time plot, that is each variable against time. • Trend: is a persistent long term rise or fall in the data over time. • Seasonal variation: is a bias pattern that repeats itself in cyclic fashion over time. Both trend and seasonal variation are deterministic disturbances in the data.
Retail petrol pricing in the US Note the strong linear trend in the retail price.
Time series data and trends Residuals distributed as (0,)
Residual time-series plot A residual time plot is a plot of the values of the time series after they have been corrected for trend and seasonal variation.
Detrending data Histogram of TS with linear trend removed Detrended time series The process of removing an underlying relationship in observed data is known as detrending the data. The goal of detrending data is to expose the underlying stochastic noise process. Data can be detrended for seasonal variation or non-linear relationships.
Inference for non-normal distributions • If you have sufficient data (N¸ 40) then you should be able to use t-procedures regardless of the distribution. • For smaller amounts of data you can look for a non-linear transformation of the data to rescale the data into something closer to a normal distribution. You should always undertake some exploratory data analysis before applying statistical tests. In particular, with t-procedures (when n < 40) you should always check the normality of the data with normal-quantile plots and look for outliers.
Engine pollutants data The data for engine pollutants is an example of a non-normal distribution. Consider the data for the CO emissions. We will find that the distribution is non-normal but a transformation Yk = log(COk) Will make the data appear normal.
Normal-Quantile plot for raw data. Normal quantiles Shows a light tail. Be careful of the this figure - note that the axes are swapped. Data quantiles
Normal-Quantile plot for transformed data. Data is transformed by taking the log of the CO emissions. Yk = log(COk) Normal quantiles Data shows good correspondence to a normal distribution. Data quantiles
Dynamic Models Another very important class of time dependent data is that of dynamic systems. Dynamic systems are the method used to describe the dynamics of machines, aircraft, electricity networks, etc... So far we have looked at this model as a black box. Lets examine what is going on inside the system model and how this relates to linear regression. System model Input Output
Filtering vs Regression • In the previous model we have no noise process in the system model (i.e. no system noise). If we did we would need to have a filter attenuating that noise. We will not cover this so we assume that all system models are completely deterministic (no noise). • When we only have measurement noise we are able (in principle) to use regression. • These systems are continuous so they have an infinite number of data points. Can we still use the least squares estimator we have been working with?
Linear regression for a time series • Everything we have discussed for least squares estimation still works here. • System identification is a major area of research and application of linear regression.
Discussion Task • Talk to the person next to you and: • Find an example of a simple dynamic system that might require some system estimation. • In your example is it physically possible to measure the rate change of the variable of interest?