250 likes | 421 Views
Improving the Quality of Tax Statistics: Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service. Scott Hollenbeck – Scott.M.Hollenbeck@irs.gov Barry Johnson – Barry.W.Johnson@irs.gov
E N D
Improving the Quality of Tax Statistics:Recent Innovations in Editing and Imputation Techniques at the Statistics of Income Division of the U.S. Internal Revenue Service Scott Hollenbeck – Scott.M.Hollenbeck@irs.gov Barry Johnson – Barry.W.Johnson@irs.gov Melissa Ludlum – Melissa.R.Ludlum@irs.gov
Today’s Presentation • Overview of Statistics of Income (SOI) • Dealing with Missing Data • Recent Innovations • Future Plans
What Does SOI Do? • Primary source of U.S. tax data • Data from 110 tax returns and information documents • Test and correct data collected during administrative processing (IRS Masterfile) • Collect extensive additional data from forms, schedules and attachments • Most projects collect data from samples • Products • Micro data files for U.S. Treasury Department & Congress • Public-use files • Tables and analysis (www.irs.gov/taxstats)
SOI Data Collection Systems • Maintains computer network separate from main IRS processing • Data collection takes place in IRS Submissions Processing Centers • Graphical User Interface (GUI) systems based in ORACLE • Data tested for internal consistency • Post-edit processing overseen by headquarters’ staff
Three Major SOI Programs • Individual Income Tax • Filed by individuals and married couples to report most forms of personal income • 133 million returns filed in 2006 • Corporation Income Tax • Filed by incorporated businesses to report income from parent corporation and subsidiaries • 2.5 million returns filed in 2006 • Tax-exempt Organizations • Annual information returns report assets, income, expenses • 833,000 returns filed in 2006
Missing Data – Unit Nonresponse • Causes • Extensions/late-filed returns • Tax evasion • Strategies • Update values from prior year using survey responses • Utilize records for recent prior years filed during the selection period
Missing Data – Item Nonresponse • Causes • Taxpayer neglects to provide attachments • Paper return is being used by another IRS function • Strategies • Use IRS Masterfile data for key values • Impute values based on existing data and information provided on prior and/or subsequent return • Surveys and direct contact with preparers
What’s New? • Digital images of tax returns • Electronic filing • Automated error correction/imputation routines
Digital Return Images • In 1998 SOI began scanning operations • Images stored in Tagged Image File Format (TIFF) • In 2006, imaged more than 71.5 million pages from 30 different tax and information returns • Many users: • SOI headquarters staff • SOI edit operations • IRS Functions • General Public (tax-exempt organizations only)
Split-Screen Edit Systems • Combines scanned image and GUI edit system on a single 24 inch wide-aspect monitor • Image displayed using Adobe Acrobat or specially adapted ORACLE programs • Image and edit systems are synchronized • Online access to instructions, dictionaries, other tools
Split-Screen Edit Systems • Positive feedback from editors • Slight overall improvement in productivity and quality • Images available to geographically disbursed work force • Reduced storage of paper documents • Reduced impact on other IRS functions
Electronic Filing of Tax Returns • 2004 Modernized electronic filing (MeF) began • Uses Extensible Markup Language (XML) to capture: • Numeric and character strings supplied by taxpayer • Information tags • 2005 mandatory e-file for large business and tax-exempt organizations • 20.5% SOI sample of corporate income taxes • 13.5% SOI sample of tax-exempt organizations
SOI Use of MeF Data • In 2006, SOI developed programs to render digital images from XML data • Edit returns using split-screen applications • In 2007, will populate ORACLE data tables directly with XML data • Editors will validate data, supply codes and allocate certain data items
Electronic Filing of Tax Returns • Individual income tax returns • 1986 – E-file through paid preparers • 1992 – E-file from home computers allowed • 1994 – 98% of all filers eligible to e-file • 2006 – 73 million returns, or 54%, e-filed • Data stored in Tax Return Database (TRDB) • ASCII data, not tagged XML • 2010 – Scheduled for conversion to MeF
SOI Individual Income Tax Program • Sample of returns processed differently depending on certain criteria • Edited returns • “Missing returns” • Forced closed returns
Individual Processing Programs • Online editing system – editors transcribe, code and review any potential data discrepancies • Post Edit Reconciliation Process (PERP) – automated computer program which validates and adjusts data
Edited Returns • Edited returns are processed through the online editing system by an editor, then reviewed using the PERP program • Prior to Tax Year 2004, all sampled returns which were not “missing” were manually edited • Currently only paper returns and electronically filed returns with specific characteristics are edited through online system
“Missing Returns” • Each year, approximately 250 paper returns selected for the sample are not located • Limited IRS Masterfile data available • PERP program used to impute missing details of forms and schedules
Forced Closed Returns • Automated processing of certain E-filed returns in the SOI sample • Bypass the online editing system and processed through the PERP program • Returns with possible discrepancies are reviewed by National Office analyst • Returns that pass all tests are considered “forced closed” and added to final data file
Results from Forced Closing Returns • Tax Year 2004 – First year using automated closing of selected electronically filed returns • Total sample size – 200,295 returns • Electronically filed – 64,670 returns • “Forced Closed” – 18,193 returns • Editing hours saved – 1,400 hours
Results from Forced Closing Returns • Tax Year 2005 – Second year of program, expanded criteria for returns eligible to be “forced closed” • Total sample size – 292,837 returns • Electronically filed – 114,897 returns • “Forced Closed” – 47,753 returns • Editing hours saved – 4,100 hours
The Future - Data • More returns and information documents will be filed electronically • Optical Character Recognition or Intelligent Character Recognition will be used to capture data from paper-filed returns • Data will be available in real time • Enable larger sample sizes and increased use of population files
The Future – Field Operations • Increased resources dedicated to resolving data inconsistencies as opposed to data transcription • Paperless environment – use of electronic data or digital images created from paper returns • Increased use of prior year data to identify and correct data anomalies
The Future - Products • Improvements in technology and increased use of electronic filing will allow SOI to produce more data, more quickly and more efficiently • Increased sample sizes will allow small area estimates • Population files will allow for creation of ad hoc panels, linkage of data items across tax form types and research on infrequent data items