1 / 31

Methodology of Allocating Generic Field to its Details

Methodology of Allocating Generic Field to its Details. Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007. Outline. Background Information on Tax Data Objective Current Methodology Other Methodologies Considered Comparison of the Methodologies

ceana
Download Presentation

Methodology of Allocating Generic Field to its Details

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Methodology of Allocating Generic Field to its Details Jessica Andrews Nathalie Hamel François Brisebois ICESIII - June 19, 2007

  2. Outline • Background Information on Tax Data • Objective • Current Methodology • Other Methodologies Considered • Comparison of the Methodologies • Future Work and Conclusions

  3. Tax Data • Statistics Canada receives annual data from Canada Revenue Agency (CRA) on incorporated (T2) businesses • Tax data: • Balance Sheet • Income Statement • 88 different Schedules

  4. Tax Data • About 700 different fields to report • Most companies provide only 30-40 fields • Only 8 fields are actually required by CRA (section totals) • Non-farm revenue • Non-farm expenses • Farm revenue • Farm expenses • Assets • Liabilities • Shareholder Equity • Net Income/Loss

  5. Objective • To impute the missing detail variables • Why ? • Tax data users need detailed data (tax replacement project (TRP)) • Different concepts and definitions between tax and survey data • A subset of details linked to the same generic can be mapped to different survey variables (Chart of Account)

  6. Challenges to meet • Methodology must • Work well for a large number of details • Be capable of dealing with details which are rarely reported and those which are frequently reported • Give good micro results for tax replacement, but also give good macro results when examined at the NAICS or full database level

  7. First attempt to complete Tax Data • Edit rules • Outlier detection within a record • Deterministic edits (to ensure the record balances within section) • Review and manual corrections • Overlap between fiscal period • Negative values • Consistency edits between tax variables • Outlier detection between records (Hidiroglou-Berthelot) • CORTAX balancing edits • Deterministic imputation of key variables • Inventories • Depreciation • Salaries and wages

  8. GDA Concepts • Corporation can use either generic or detail fields to report their results

  9. GDA Concepts • Block is defined by a generic and its details • Generic field is not a total • Goal is to impute the most significant detail variables when a generic amount has been reported • GDA: Generic to detail allocation

  10. Current method • Uses imputation classes based on industry codes and size of company • First 2 digits of NAICS (about 25 industries) • Three sizes of revenue (boundaries of 5 and 25 million) • Calculates ratios within imputation classes for each block • Uses all non-zero and non-missing details • Uses only details reported at least 10% of the time (5% for block General Farm Expense) • Assigns ratios to businesses with a generic

  11. Current method • Originally proposed as a solution with good macro (aggregate) results • Now need good micro (business) level results for TRP • Problems • Imputation classes are frequently not homogeneous in terms of distribution • A large number of small imputation classes

  12. Other methods considered • Historic imputation method • Scores method • Cluster method

  13. Historic imputation method • Assumes distributions of details are the same from one year to the next • Problems • A change in business strategies/properties will not be considered this way • Most businesses which report details in the previous year will report them also in the current year, leaving few businesses which could be imputed with this method (~5% on all blocks tested) • Requires use of another method for remaining businesses

  14. Scores method • Uses response/non response models for each detail • Groups businesses into imputation classes on the basis of percentiles of response probability • Calculates ratios within imputation classes • Assigns ratios to businesses with a generic

  15. Scores method Problems • Need to create a model for each detail • Difficult to resolve what to do in the case of blocks with many details (5 or more) which are frequently reported • This method was excluded due to it’s difficulty in coping with blocks with a moderate to large number of details

  16. Cluster method • Divides businesses into imputation classes on the basis of response patterns to details • Uses clustering or dominant detail method • Uses discriminatory models (parametric or not) to assign businesses with generic to imputation classes • Calculates ratios within imputation classes • Assigns ratios to businesses with a generic

  17. Cluster method • Problems • For certain blocks it can be difficult to find good variables on which to discriminate • Issue of how often clustering method and models should be reviewed

  18. Comparing the methods • Estimate distributions of known data for year n from ratios calculated for year n-1 • Create a benchmark file • Reported details in years n-1 and n • Put all details into generic fields in yearn • Calculate ratios from businesses in year n-1 for all methods • Assign ratios to businesses in year n • Compare the results to the reported fields

  19. Comparing the methods • Compare the results at the micro (businesses) and the macro (aggregate) levels • Compare true and estimated distributions

  20. Comparing the methods • Macro statistics for the jth detail in the block

  21. Comparing the methods • Micro Statistics • Median Pseudo CV for the jth detail and ith business in the block

  22. Comparing the methods • Micro Statistics • Median Pearson Contingency Coefficient for the jth detail and ith business in the block • f values represent the marginal distributions • d2represents the degree of dependency (depends on n, r and c)

  23. Comparing the methods • We show results for Block 8230: Other Revenue • This block has 20 details covering revenue distribution • Important for clients as used in many surveys • The scores method is not shown as it is difficult to implement with this many details

  24. Comparing the methods

  25. Results

  26. Cluster methodology • Most blocks use dominant detail (attractor) x clusters to define the imputation classes • A business i belongs to cluster j of attractor x where x>50 if where is the total value reported by business i in detail j. If this statement is not true for any detail then the business is assigned to cluster j+1.

  27. Cluster methodology • Distribution ratios to details are calculated for each cluster • Discriminatory models are then created (nonparametric for most blocks) to assign businesses with a generic • Use variables on industry (NAICS), location (province), size (revenue, log revenue), details and totals of details in other blocks

  28. Cluster methodology • Generic amounts are assigned to details in the following 3 ways • If generic amount and no details reported then ratios are assigned as calculated • If generic amount and all details with ratio greater than 0% are reported then ratios are assigned as calculated • If generic amount and some details but not all are reported, then ratios are pro-rated and generic is assigned only to details which were not reported

  29. Cluster methodology • Gives better micro results • Improved data for tax replacement • Macro results remain similar to current methodology • Micro results are consistent year to year

  30. Future work and conclusions • The cluster methodology will be implemented for reference year 2006 for the Income Statement • Model fitting and implementation for Balance Sheet will follow • Review of models and clustering methods as deemed appropriate

  31. Contact Information / Coordonnées Jessica.andrews@statcan.ca Francois.brisebois@statcan.ca Nathalie.hamel@statcan.ca

More Related