230 likes | 374 Views
Changing Outlier Methodology for a Financial Survey. Gareth Morgan gareth.morgan@ons.gsi.gov.uk. Outline. 1. Motivation 2. Foreign Direct Investments (FDI) Survey 3. Outlier Detection & Treatment Methods 4. Analysis & Results 5. Recommendations. Motivation.
E N D
Changing Outlier Methodology for a Financial Survey Gareth Morgan gareth.morgan@ons.gsi.gov.uk
Outline 1. Motivation 2. Foreign Direct Investments (FDI) Survey 3. Outlier Detection & Treatment Methods 4. Analysis & Results 5. Recommendations
Motivation • Updated International Regulations • IMF’s Balance of Payments & International Investment Position Manual • OECD’S Benchmark Definition of FDI • New methodology for the FDI survey • New short questionnaire • Changes to stratification and sample design • Changes to Editing & Imputation methodology • Changes to Estimation & Outlier methodology
Glossary • Immediate Parent • Directly above affiliates in ownership chain • Affiliate • Part owned by parent company (at least 10%)
Foreign Direct Investments Survey (FDI) Collects information on the financial relationship between: UK Parent Foreign Parent 2 Foreign Parent 1 Foreign Affiliate 1 Foreign Affiliate 2 UK Affiliate The FDI survey is a key contributor to the UK’s balance of payments and international investments position
FDI - Survey Details Split into two surveys: • Inwards – UK affiliates, foreign parents • Outwards – UK parents, foreign affiliates Both surveys collect quarterly & annual data Estimate for three sectors: Oil, Finance, Other Each sector contains a number of strata. • Stratified by size measures
FDI - Survey Data • Financial Survey – More than 35 questions • Most >= 0 • Four questions containing Positive & Negative values • Large proportion of zeros • ‘Subsidiary Profit’ – 40% returned zeros in 2010 (Annual Inwards) • Small sample size - ~2500 returned questionnaires (2010 Annual Inwards)
Outlier Methods – What are Outliers? Outliers - Extreme values, unlike the rest of the sample and with no special treatment could lead to over-estimates. Representative Outliers: A sample element with a value that has been correctly recorded and that cannot be regarded as unique. -Chambers (1986)
Outlier Methods – Why are Outliers important? Estimated stratum totals (simplified): Outliers give an inflated stratum mean, leading to over estimates. N = population size, n = sample size
Outlier Methods – Aims Aims of this work: • Compare 3 different outlier detection and treatment methods • Test all 3 in a simulation study, based on real survey data
Outlier Methods – Current Method (Trim) Positive values only: Top 2% of values removed Positive & Negative: Bottom 2% of values are also removed After Trimming: Mean = 5 Before Trimming: Mean = 20.8 Used to calculate mean
Outlier Methods –Distance from the Mean (Dist) If this does not hold for y, then y is an outlier & removed. Used to calculate mean Before DIST: Mean = 20.8 After DIST: Mean = 5
Outlier Methods – One-sided Winsorisation (Win) Reduces large values which are considered outliers. Example: Before WIN: Mean = 20.8 After WIN: Mean = 12.5 To determine & treat outliers, use the ‘L-value’ parameter (L), design weight ( ) and value mean ( ) y* replaces y when calculating the mean
Winsorisation – Negative Values Questions containing negatives: One-sided Winsorisation will not work. Solution: Create two new variables
Analysis • Take our returned sample data as the ‘population’ • Sample and apply outlier detection methods • Calculate Bias Ratio & MSE over 10,000 independent samples • Results created for the Finance & Other sectors l = 1, 2, 3, ........, L Y = stratum pop total = stratum total (sample estimate)
Results – MSE(Unquoted Equity Cap & Reserves) Finance Sector: MSE against Iteration MSE No. INDEPENDENT SAMPLES
Results – MSE (Unquoted Equity Cap & Reserves) Other Sector: MSE against Iteration MSE No. INDEPENDENT SAMPLES
Results – Bias, Variance & MSE • Other sector – very large variance due to unrepresentative outlier • – limitation of small population
Results – MSE & Bias Ratio • RR MSE – Scaled version of MSE. • All 3 methods similar in RR MSE • All 3 methods under-predict at sector level
Results – Outliers Detected • Other Sector – DIST detects 2x as many outliers as TRIM, Almost 16x as many as WIN • Gives reduced Variance
Conclusions • Compared to trimming (current method): • DIST- gives more stable results, but has larger biases • Winsorisation – Higher Variance, but consistent Bias Ratio. Best MSE • Sampling caveats • Small population – hard to generalize to total population • Can cause problems with non-representative outliers • Changes to sampling rates require different L-values
Recommendations • Overall Winsorisation is the recommended method • Best in terms of RRMSE & gives good Bias ratios • Uses treated outliers, rather than removing them (good for small amounts of data) • Due to be implemented in 2013 • Further work • Attempt simulation study with a pseudo-population • Apply simulation study to other surveys
Any Questions? gareth.morgan@ons.gsi.gov.uk