1 / 15

The Detection of Outliers

The Detection of Outliers . Mr. Faiz Alsuhail, Statistics Finland faiz.alsuhail@stat.fi . Outline of the presentation. Faiz Alsuhail. The setup of the problem Description of the method Some results with Finnish turnover data Discussion. 1.1.2020. 2. Setup. Faiz Alsuhail.

aslagle
Download Presentation

The Detection of Outliers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Detection of Outliers Mr. Faiz Alsuhail, Statistics Finland faiz.alsuhail@stat.fi

  2. Outline of the presentation Faiz Alsuhail The setup of the problem Description of the method Some results with Finnish turnover data Discussion 1.1.2020 2

  3. Setup Faiz Alsuhail A large number of time series are collected to produce turnover indecies. At the moment data validation requires a large amount of resources, -both labour and time. Good data quality is vital to produce good indecies, therefore data validation is an important step in index production. The goal is to make the validation less time consuming, yet accurate. 1.1.2020 3

  4. About the method Faiz Alsuhail A long history of observations is available for each company that reports its turnover to Statistics Finland. Hence, we can model each time series in order to forecast the future values. If a company reports a figure which differs significantly from the forecast, we may believe that the observation is an outlier. 1.1.2020 4

  5. The first step Faiz Alsuhail • Create a time series model for each time series and forecats the next value. • Compare the observations and the forecast you have computed by using a t-test. • Now one can rank the observations according to the p-values of their t-tests. • 1 minus the p-value tells how likely it is that the observation is an outlier. 1.1.2020 5

  6. Then what? Faiz Alsuhail • If an observation is suspicious then one may wish to contact the company that has reported the figure. • However there are a huge number of companies. Therefore it would be desireable to know, which observations are (potentially) more harmfull to the index. • Hence the t-test itself is not enough. 1.1.2020 6

  7. The potential harm Faiz Alsuhail • We multiply the forecast error by 1-(p-value)t-test and by the company’s share in the aggregate index. • By multiplying the forecast error by 1-(p-value)t-test we get the potential error of one obervation. • By multiplying this with the companys share in the whole turnout we get the observations potential harm to Finland’s turnover index. • Now one can rank different observations by their potential harm. 1.1.2020 7

  8. In terms of mathematical formulas Faiz Alsuhail • What we want to compute is: (forec. error)*(1 - p-valuet-test)*(share in the aggregate turnout) where: • (forecast error) requires the use of a time series model • (1 - p-valuet-test) is obtained from a t-test • (share in the aggregate turnout) is straightforward to calculate. 1.1.2020 8

  9. Results: Manucacture of food products and beveridges, Potential error (%). Faiz Alsuhail 1.1.2020 9

  10. Results: Manufacture of fabricated metal products, except machinery and equipuipment, Potential error (%). Faiz Alsuhail 1.1.2020 10

  11. Benefits Faiz Alsuhail The method can rank different observations according to their suspiciousness. This tells the statistician which observations to check if time is limited. The method is computationally quite simple and can be integrated to the production system of indecies. The time series approach can take into account the seasonal nature of the time series. 1.1.2020 11

  12. Challenges Faiz Alsuhail There must be special expertise to create time series models and to run their diagnostic tests. The time series models don’t quite function if the future doesn’t behave like the past. The method only tells which observations are potentially harmfull but doesn’t reveal the outliers. The statisticial must still use his/her insight to tell wheter an obervation is suspicious or not. 1.1.2020 12

  13. More challenges Faiz Alsuhail There should be enough time and resources to model the time series. Model update must take place with regular basis, at least once a year. If one wants to go through all the companies then the time series must be modelled automatically, for example with the help of a BIC-criteria. 1.1.2020 13

  14. Questions to discuss Faiz Alsuhail How do you tackel the probelm of outlier detection in your statistical officies? Do you believe that modelling a time-series with a linear model is a good starting point for data validation? If yes, then should one model time series on a two, three or four digit level? Are there more challenges or benefits that were not listed in this presentation? 1.1.2020 14

  15. Faiz Alsuhail Thank you for your attention 1.1.2020 15

More Related