Error Estimation

Error Estimation

Definition of the “Error” • The measurement error can be defined as the difference between the measured value and the actual value (truth).

Types of Measurement Errors • Types of measurement errors:- Systematic errors.- Random Errors.- Personal Errors.

Systematic Errors • Systematic errors are biases in measurement so that the mean of many separate measurements deviates from the truth. • Can be constant (offset error), related to the measured value (scale-factor error), or anything else (e.g. nonlinear behaviour). • Possible sources:- imperfect calibration of measurement system.- shortcoming in principle of measurement.- (unaccounted-for) environmental interference.- aging of the instrument (drift). • Can be removed by careful calibration.

Random Errors • Random errors are errors in measurement that lead to measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken. • All measurements are prone to random error. • “Variance” (or standard deviation) is used to quantify them. • Possible sources:- unpredictable fluctuations in the readings of the measurement system (e.g. electronic noise).- interpretation of instrumental readings.- (unpredictable) environmental interference.- deterioration of the instrument conditions. • Repetition of the same measurement reduces random errors.

Personal Errors • Personal error is the error that resulted from applying a faulty procedure. • Possible sources:- faulty procedures.- faulty readings/recordings.- faulty interpretations. • These are human “mistakes” and usually difficult to deal with.

Measurement Resolution • Measurement resolution is the smallest change in the underlying physical quantity that produces a response in the measurement. • Truncation error. • Limitation of the instrument.

Accuracy and Precision • Accuracy of a measurement system is the degree of closeness of measurements of a quantity to the actual value (truth) of that quantity. • Precision of a measurement system is the degree to which repeated measurements under unchanged conditions show the same results.

Accuracy vs Precision (1) T T Accuratebut not precise. Precisebut not accurate.

Accuracy vs Precision (2) Truth=30.1°C μ= 30.0°C σ ~ 3.5°C μ= 23.5°C σ ≤ 0.2°C Th. 1 Th. 1 Th. 1 Th. 1 Th. 2 Th. 2 Th. 2 Th. 2 Accuratebut not precise. Precisebut not accurate.

Notes • Accuracy is related to the systematic errors. • Precision is related to the random errors. • Accuracy, precision and resolution are not the same.

Error Estimation when the truth is known! (1) • If we know the truth (T), error estimation of a measurement system (x) is straightforward. • For N measurements of the truth using a given measurement system:

Error Estimation when the truth is known! (2) • Systematic error (Bias) is: • Root mean square difference (RMSE): • Standard deviation of the difference (SDD): • Scatter index (SI): • Note that RMSE includes both systematic and random errors while SDD is the proper measure for random errors.

BUT…. usually we do not know the truth! • It is not possible to make any error estimation using one measurement system. • Usually, with 2 independent measurement systems, it is possible to estimate relative errors. • Absolute error estimation requires at least 3 independent measurement systems.

Our measurement systems (for surface wind and waves) • Buoy = In situ instruments mounted on buoys or platforms. • Model = Numerical atmospheric/wave models like IFS & WAM. • Satellite = Satellite measurements using altimeters (wave height and wind speed), scatterometrs (wind velocity) and SAR (‘truncated’ wave spectra).Here, only altimeter measurements are used. • Cross comparison between any two types of measurements  relative errors!

In-Situ Measurements • Usually mounted on buoys and platforms.(Pressure gauges can be mounted on seabed in shallow waters.) • Ground truth. (Is it so?) • Usually very close to the coast. • A lot of practical issues. • Limited coverage (in space and time). • Mainly available in the Northern Hemisphere.

Locations of buoys available through GTS (Jan. 2011)

Radar Altimeters • Global coverage every few days/weeks. • Data continuously available since 1991 (more than two decades) from two series of satellites (some data from 1980’s as well):- European Space Agency (ESA): ERS-1, ERS-2, Envisat.- French-US collaboration: TOPEX/Poseidon, Jason-1, Jason-2. • May not be available when or where needed. • Not suitable for coastal areas (yet). • May not be suitable for climate studies.

Typical Daily Coverage of Envisat Ground Tracks

Models • Global coverage and as frequent as few minutes/hours. • Produces forecasts which is crucial for operational uses. • Ability to make “hindcast” (or “reanalysis”). • Suitable for climate studies (e.g. ECMWF ERA-Interim and future ERA-CLIM). • Modelling issues: parameterizations, resolution, ... etc.

Relative Errors • In practice, the truth is unknown. • Systematic error (bias) cannot be found in absolute sense. Always, a reference is required. • Traditionally, estimation of the random error is done against a reference; e.g. buoy measurements  relative error. • Example: Comparison of significant wave height from 3 Altimeters (Envisat, Jason-1, Jason-2) against ECMWF wave mode (WAM).

Comparison between Altimeter and WAM FG SWH Global comparison over a year (02 Feb. 2010 to 01 Feb. 2011).ECMWF WAM model First Guess (FG) significant wave height (SWH) is used as a reference.

Warning! It is “Difference” Not “Error” • For two systems (X and Y) measuring the same truth at the same location and time; it is assumed that:Error Variance = N-1 Σ(Xi – Yi)2 – Bias2 • But this is just the “difference” not the “error” unless system Y is “error-free” (which is highly unlikely). • Using 3 (or more) systems instead of 2 solves this problem. “Triple Collocation Technique”.

Error model • Assume that the errors are linear additives to the true value (the “truth”). • For any measurement, Xi, we assume that:Xi = α + βTi + eihere:α is a fixed bias in the measurement system (accuracy).β is a calibration constant of the measurement system (a bias that depends on the truth). Ti is the truth.ei is the random error which is assumed to be of zero mean. • Except for the measurement Xi, all the variables are unknowns.

Triple Collocation (1) • Given measurements from 3 (or more) independent measuring systems (Xpi , p = 1, 2, 3) measuring the same truth (at the same location and instant, i ), the error model for each measurement (i= 1, …, N) from each system as follows:Xpi = αp + βpTi + epi ; i= 1, …, N ; p = 1, 2, 3 • For wind and waves, the 3 systems are usually a satellite-born (Altimeter) instrument, an in-situ instrument (buoy) and a model.

Xpi = αp + βpTi + epi ; i= 1, …, N ; p = 1, 2, 3 Triple Collocation (2) • It is not possible to estimate the biases, αp, in absolute sense. • The wayout is to remove the bias of the system from all its measurements  all measuremnts will be of zero bias. • The calibration constants (βp) will be found by iteration and for the time being let’s divide each term by the corresponding βp. • The error model for the 3 systems: X’1i = Ti + e’1iXi = Ti + exi X’2i = Ti + e’2iYi = Ti + eyi X’3i = Ti + e’3i Zi = Ti + ezi Change of variables for clarity: X’1i=X1i /β1= Xi ; e’1i= e1i /β1= exi ; X’2i=X2i /β2= Yi ; e’2i= e2i /β2= eyi ; X’3i=X3i /β3= Zi ; e’3i= e3i /β3= ezi .

Triple Collocation (3) • To get rid of Ti , take the differences between each pairs of collocated measurements: Xi – Yi = exi – eyi Xi – Zi = exi – ezi Yi – Zi = eyi – ezi • Find the expected values (denoted by angled brackets ·) of the squares of the differences as: (Xi – Yi ) (Xi – Zi) = e2xi  – exieyi  – exiezi  + eyiezi (Yi – Xi ) (Yi – Zi) = e2yi  – exieyi  – eyiezi  + exiezi (Zi – Xi ) (Zi – Yi) = e2zi  – exiezi  – eyiezi  + exieyi  • All quantities on the LHS are known (from measurements). • e2x, e2y and e2z are the sought error variances. • exey, exez and eyez are the error covariances/correlations.

Triple Collocation (4) • If the 3 measurement systems are truely independent their errors are uncorrolated exey = exez = eyez = 0 • Notice that we are dropping the subscript i from the error (co)variances just for ease of writing. • Now we can find out the error variances as:e2x = (Xi – Yi ) (Xi – Zi)e2y = (Yi – Xi ) (Yi – Zi)e2z = (Zi – Xi ) (Zi – Yi)

Xpi = αp + βpTi + epi ; i= 1, …, N ; p = 1, 2, 3 Triple Collocation (5) • We assume that one of the systems (say the first one, X) is calibrated (i.e. βx = 1), and we calibrate the other two systems (i.e. To find βy and βz) accordingly. • The neutral regression is used for that.(Conventional regression is not suitable as it assumes that one of the measurement systems is error-free). • βy= [–B + (B2– 4 AC)1/2] / 2 Awhere:A=  XiYi  ; B = Xi2 –  Yi2 ; C= – XiYi  = e2x / e2y • Similarly, βz can be found by replacing Y above with Z.

Xpi = αp + βpTi + epi ; i= 1, …, N ; p = 1, 2, 3 Triple Collocation (6) • The procedure: • Collocate 3 (or more) independent sets of measurements. • Remove the bias from each data set independently. • Compute the variances and covariances of the data sets: Xi2 ; Yi2 ; Zi2 ; XiYi  ; XiZi  ; YiZi  • Assume that βx= βy=βz = 1. • Solve for e2x ; e2y ; e2z. • Solve for βy and βz using neutral regression. • Adjust the measurements of each data set using its calibration constant, β. • Repeat starting from step 3 until convergence.

Practical Considerations – Correlated Errors • The errors of different systems can be correlated due to: - Data assimilation. - Sharing same principle of measurements. - Use of data from one system in retrieval algorithms of another system. • Note: we are talking about “error correlation” not “measurement correlation”. • Possible treatment: - Use model forecasts or run models without data assimilation. - Evaluate/estimate the correlation (exey,exez or eyez).- Additional measument system(s) quadrable/quintable colloc.

Practical Considerations – Collocation Distance/Time • Ideally, the measurements from the 3 systems are collocated exactly at the same location and same instant. • This not practical as the number of collocations drops dramatically as a function of the collocation distance. • For wind and wave collocations, we allow up to 100-200 km and up to 2 hours. • We need to account for the collocation distance.Increase in model error & decrease in buoy and Altimeter errors.

Practical Considerations – Collocation Distance/Time

Practical Considerations – Homogeneity • Due to the collocation distance, sometimes the buoy and the Altimeter are measuring two different truths (e.g. islands, fronts). • Those cases need to be detected and removed. • The model is used as a filter:Model@buoy – Model@Altimeter ≤ 5%

Example • Quality control of buoy and Altimeter data. • Triple collocation of significant wave height (SWH) & surface wind speed (U10) between 1 August 2009 – 31 July 2010: - Model Forecast (8 FC steps), ENVISAT, Buoys. - Model Forecast (8 FC steps), Jason-2, Buoys. - Model Forecast (8 FC steps), Jason-1, Buoys.( i.e. 24 “different” data sets). • A collocation is rejected if: - Obvious erroneous data. - Inhomogeneous conditions at buoy and Altimeter locations.

Example – Results for Wind Speed

Example – Results for Significant Wave Height

Other Methods • Janssen et al. (2007) introduced a method to estimate the significant wave height error in ERS-2 Altimeter measurements and the model based on along-track averaging. Correlation length between 8 individual Altimeter measurements was found. • Other methods utilise some known features of the measured quantity:e.g. - probability distributions; - spectral properties. σ2= e2mod + N-1σ2aN = No. Alt. obs. σ2= e2mod + N-1σ2aN = effective number

Further Details: • Janssen, P.A.E.M. (2004). The interaction of ocean waves and wind. Cambridge, UK: Cambridge University Press. • Janssen, P.A.E.M., Abdalla, S., Hersbach, H. and Bidlot, J.-R. (2007), “Error estimation of buoy, satellite, and model wave height data”,J. Atmos. Oceanic Technol., 24, 1665–1677. • Abdalla, S., Janssen, P.A.E.M. and Bidlot, J.-R. (2011), “Altimeter near real time wind and wave products: Random error estimation”, Mar. Geod., 34(3-4), 393-406. • Zwieback, S., Scipal, K., Dorigo, W., and Wagner, W. (2012). “Structural and statistical properties of the collocation technique for error characterization”, Nonlin. Proc. Geophys., 19, 69-80. • Stoffelen, A. (1998), “Towards the true near-surface wind speed: Error modeling and calibration using triple collocation”, J. Geophys. Res., 103, 7755-7766.

Error Estimation