260 likes | 268 Views
This article discusses the background and challenges of compiling administrative data, with a focus on taxation. It explores the data semantics of register data and proposes a taxation metadata definition. The article also presents some practical steps for the future.
E N D
CONCEPTUAL MODELLING OF ADMINISTRATIVE REGISTER INFORMATION AND XML - TAXATION METADATA AS AN EXAMPLE Heikki Rouhuvirta, Statistical Methodology R&D heikki.rouhuvirta@stat.fi Ottawa, 16-18 May 2005
Contents • Background • The Challenge • Primary Questions • Test Case – Finnish Taxation • Data Semantics of Register Data • Taxation Metadata Definition • Some Results • The Future • Some Practical Steps on the Way Heikki Rouhuvirta
Background • Present state of compilation of administrative data • as the challenge • CoSSI • as the methodological framework for data semantics of registers • Codacmos • as the organizational base for concept testing Heikki Rouhuvirta
Present state of compilation of administrative data Statistical Information Administrative Data Source Handbook Of Taxation etc. Data Source (e.g. RDB) data tailor-made programs gathering or ETL products Operational systems or Data Warehouses (e.g. SQL) (e.g. Informatica, Oracle) Data Source transmission Statistical Application file (sequential/ Flat File) statistician Data Data communication Store physical media FTP (+ VPN) network (CDROM, magnetic (Flat File) (internet, WAN) tape) Destination NSI Data Combining Data Store transmission data data file extraction/ Relational DB (sequential/ gathering transformation/ Flat File) (e.g. SQL) Data loading Store Data Store tailor-made programs Relational DB or ETL products (e.g. Informatica, Oracle) Statistical Register Data Survey Data Heikki Rouhuvirta
CoSSI • Common Structure of Statistical Information – CoSSI • covers different ways of statistical data organization (statistical data matrix and statistical table) • includes a model to define contentual information in statistics • Includes a model to define the methodology used in statistics (e.g. measuring and classification) • manages the complexity of statistical information (e.g. nested variables structure) • includes definitions for all types of the statistical information, data, metadata for files, statistical metadata, quality declarations, charts • the main objective was to organise statistical data so that they also contain statistical metadata (describing both the structure and logic of statistical metadata at the same time) • Definition Descriptions available on the web at: http://www.stat.fi/org/tut/dthemes/drafts/cossi_definition_descriptions_v_09_2003.pdf • Statistical metadata see also from the web: http://www.stat.fi/org/tut/dthemes/papers/alternative_approach_to_metadata_codacmos_2004.pdf Heikki Rouhuvirta
Codacmos • Cluster of Data Collection Integration & Metadata Systems for Official Statistics • EU Project 2003- 2004 (IST-2001-38636) • Consortium: • Italian National Statistical Institute, Statistics Finland, University Of Edinburgh, National Statistical Service of Greece, DESAN Research Solutions, Statistical Division Of Municipality Of Milan, The Finnish Tax Administration, University Of Patras, Institute Of Informatics And Statistics, University Of Athens, National Social Security Institute, Tietokarhu Ltd, Statistics Norway • http://www.codacmos.eu.org • TAXATION METADATA Partners: Statistics Finland, The Finnish Tax Administration and Tietokarhu Ltd Heikki Rouhuvirta
The Challenge: how the present process, where the description of administrative data can mostly be read from the authorities' administrative handbooks, can be transformed into such that it meets the requirements for the usability and presence of the contentual description of data both in the production process to statistics producers and in the distribution of statistical information to users of statistics. Heikki Rouhuvirta
Primary Questions • what are the metadata of administrative data? • how to process the metadata specifying the interpretation and use of administrative data collection and register data? • how to combine the original data description (e.g. concept definitions of register fields) to variable description and measurement information of statistics? • can accumulating interpretive metadata be “transported” in processing of information and if can, how? Heikki Rouhuvirta
Test Case – Finnish Taxation (Finnish taxation on the web at: http://www.vero.fi) Heikki Rouhuvirta
Taxation: Types and Sources of income Heikki Rouhuvirta
Income tax deductions Heikki Rouhuvirta
Data Semantics of Register Data • Modelling methodology: • starting point is to distinguish between • substance concept model and • information model whereby the concepts are described • Information organizing method: • any which doesn't lose information • Technology: • any without restrictions • Result: • Taxation metadata definition (taxmeta.dtd) Heikki Rouhuvirta
Basic Substance Concept Tax type:i.e. Personal taxation Type of income:i.e. earned income, capital income A) Income:i.e. salary, pension Type of tax deduction B) Deduction Heikki Rouhuvirta
Description Information Income:i.e. salary, pension Deduction Law:reference to a section of law 1) Law case:reference to a law case Formula:How the tax is calculated 2) Internal instruction:Instruction on spesific income and deduction area 3) Heikki Rouhuvirta
Taxation Metadata Definition (taxmeta.dtd) Available on the web at: http://www.stat.fi/org/tut/dthemes/drafts/taxmeta_dtd_v_01.txt Heikki Rouhuvirta
Taxation Metadata -Logical Concept Model (I) Heikki Rouhuvirta
Taxation Metadata -Logical Concept Model (II) Heikki Rouhuvirta
… result from register standpoint Demonstration Report is available on the web at:http://www.stat.fi/org/tut/dthemes/papers/ demoreport_on_taxation_metadata_codacmos_2004.pdf Heikki Rouhuvirta
Taxation register view Taxpayer’s tax register record Plain-language code (derived or column name) Metadata Value in euro Tax type code used in the register Structure view Metadata view Heikki Rouhuvirta
… and result from statistics standpoint Heikki Rouhuvirta
Income distribution statistics – statistical metadata Heikki Rouhuvirta
Income distribution statistics – taxation register metadata (I) statistical metadata register metadata Heikki Rouhuvirta
Income distribution statistics – taxation register metadata (II) statistical metadata register metadata Heikki Rouhuvirta
The Future • Could it be …. • integrated register metadata • a genuinely metadata-driven statistical production process • rich metadata is present and available in all production stages, including editing as well as transforming of register concepts to statistical concepts • metadata accumulates as the process advances without losing old metadata • rich metadata is also available for users during the dissemination process of statistical information Heikki Rouhuvirta
x x … x … x 11 12 1j 1p x x … x … x 21 22 2j 2p . . . . . . . . x x … x … x i1 i2 ij ip . . . . . . . . x x … x … x n1 n2 nj np Variable x x … x … x 1 2 j p . Statistical unit . a i . . n a XML based metadata-driven statistical production collection routines transaction based data storage RDB Hand- book of Register XMLDB units based data report with meta Register Metadata (xml) 1° aggregation Questionnaires (xml) data gathering data transmission xml based production system data combining statistical metadata based on CoSSI units and variable based data organisation combined data collected data matrix based on CoSSI checked values new metadata data editing new variables conceptual formation Heikki Rouhuvirta
Some Practical Steps on the Way • Plan to implement this scheme of things to metadata of other registers (e.g. population register) • Integration of structured statistical metadata system with statistical software packages (e.g. SAS, SuperStar) for simultaneous use Heikki Rouhuvirta