170 likes | 355 Views
Role of the IMDB in the CBA and IM Strategy. Presented to Information Management Committee. Standards Division. June 7 2010. Current content IMDB and the GSBPM Questions and questionnaires Variables Quality indicators Quality review of the IMDB Links to datawarehouses
E N D
Role of the IMDB in the CBA and IM Strategy Presented to Information Management Committee Standards Division June 7 2010
Current content IMDB and the GSBPM Questions and questionnaires Variables Quality indicators Quality review of the IMDB Links to datawarehouses Role of the IMDB in the CBA Mapping to other metadata standards Outline
Inventory of statistical metadata for all surveys and statistical programs Inventory of all questionnaires (XHTML, PDF) – Information for Survey Participants, DLI/PUMF Inventory of all variables, statistical units and classifications used for collection and dissemination Potentially inventory of all questions, response choices, interviewer notes Inventory of many documents for surveys and statistical programs – to support DDI and SDMX documentation and other international reporting IMF’s DQAF http://f6dwcsql/en.stcwiki/index.php?title=Portal:IMDB Content in IMDB
Definitional metadata (or structural metadata – SDMX term) description of statistical data variables (statistical units, property and representation), their definitions and related classifications questions, record layouts Reference metadata Describes statistical datasets and processes Data sources, data collection, survey methodology, imputation, estimation Not part of “work flow” Operational metadata Measures of data accuracy – response rates, CVs sample size, limited statistical release information Metadata not included in IMDB Other operational metadata (edit failures, quality metrics, sign-offs) Systems metadata (edit rules, derivation rules, coding rules, imputation and estimation rules) Dataset metadata (structure, footnotes, titles) but IMDB does contain links to some record layouts, disseminated products and master data files. Paradata (not included in IMDB) Information related to statistical data and production process linked to a person, business or organization (i.e., unit in sample, unit has responded, number of attempts to reach unit) Types of statistical metadata in and not in the IMDB “PASSIVE METADATA” “ACTIVE METADATA”
IMDB in the survey life cycle Quality management and metadata management Datastores Input data Micro-data Confidential aggregate data Public output data IMDB IMDB Metadata/paradata 1 Specify needs 2 Design 3 Build 4 Collect 5 Process 6 Analyze 7 Disseminate 8 Archive Operational data Registers Survey Data Administrative Data Operational Data Stores
Many uses: Harmonized content – approved questions and variables are stored in the IMDB (STCwiki and STCwebsite) IMDB-generated questionnaires Of the 467 questionnaires on the Internet - 45% CLF2 compliant DDI 3 (DLI) – pull questions, interviewer notes and other metadata from IMDB in RDC through Oracle forms) Question inventory could be used by QDRC for testing and quality – reuse of concepts and questions (CBA) Questions and questionnaires
comprehensive inventory of variables and related classifications systematically evaluating 1,400 active CANSIM tables to build variables and classifications to date, variables and classifications have been developed for prices, 37 out 42 tables produced by the UES survey programs and 33 out of 230 tables from the SNA tables Annual Survey of Services Variables for harmonized content– part of HSS Resource intensive – need to validate variables with SMOs for economic and social statistics – what model should StatCan follow? Variables
Pilot project – Statistics by variable Prototype developed with 5 variables from harmonized content and links to CANSIM Working with Client Services and Dissemination Usability testing completed – March 2010 Expand to 30 variables on Analysts and researchers portal (June 2010) Present to Dissemination and Communication Committee for approval ‘under construction’ approach – incrementally populate variable portal Variables
Need to improve coherence across surveys and statistical programs in data accuracy section of IMDB Integrate DPR indicators (accuracy, relevance, organizational efficiency) and indicators from 2009 Quality Guidelines – see 2010 METIS paper Approach – ABS has Quality declarations for each survey and ISTAT has indicators for GSBPM processes Crosscutting – Corporate Planning, Quality Secretariat and Standards Division work together Are there additional DQ indicators required from an IM perspective? Data quality indicators
Data quality indicators Quality declaration
T2 TAX DW uses IMDB classifications (value domains) and its value domain loader tool: Enterprise Complexity Categories TDD Processing Environment TDD Name of Geographic Location and Geography – Canada, Region, Province TDD Imputation Categories TDD NAICS Groups Reference Year TDD Survey Universe Categories TDD All warehouses have metadata menu option and links to IMDB content via STCwiki but not used all the time Links to datawarehouses(“data centres”)
Involved with the prototype of information sharing between the ASM and SNA datawarehouses (CBA) SNA is the lead on this project Connectivity between data centres and processes both upstream and downstream Pull passive metadata from the IMDB (June 2010) May require other types of metadata from other systems or expand the IMDB Gain practical understanding on how to organize data centres and make recommendations to the IMC and CBA Management Committee Links to datawarehouses(“data centres”)
Report circulated to ARB and CBA Task Force recommends that the IMDB: Authoritative source of definitional and reference metadata for Statistics Canada metadata be “reused” to support GSBPM phases Other metadata/metainformation systems ‘pull’ from existing metadata from the IMDB Use international metadata standards for ‘meta-driven systems’ Review of IMDB architecture and other meta-information systems commissioned by ARB/Classification Systems Branch Preliminary results presented to IM Committee Social survey processing environment will be linked to the IMDB Role of the IMDB in CBA
Metadata Environment: TO-BE Statistics Canada • Statistique Canada 14
Data Dissemination Data Dissemination Social statistics:Data liberation initiative (DLI); STC microdata files Data Transfer between Organizations and Organizational Units Data Collection Data Collection DDI IMDB metamodel National Accounts BOP Trade in Services ISO11179 CMR XBRL SDMX Database Interoperability Financial data from businesses CWM Tax, Health and SNA datawarehouses Mapping IMDB to other metadata standards Thesauri/search resources
DDI (Data Documentation Initiative) already used in STC PUMF and “analytic” microdata files are DDI-XML tagged by DLI – STC and universities, and CRDCN (only until 2012) CRDCN has asked STC to continue DDI tagging after 2012 WG on Social Survey Metadata Environment is ensuring the ability to generate DDI output from generalized processing systems Should the IMC recommend that DDI be a standard output of microdata files? Mapping IMDB to other metadata standards
3-year work plan in place one interface – METAWEB – for entering and updating metadata Metadata entry done by Standards and only some divisions Can the responsibility for entering and updating metadata (and variables) be pushed out to SM divisions or the IM Secretariat? IMDB systems development