180 likes | 198 Views
StatLine 4 metadata implementation. Edwin de Jonge Statistics Netherlands. What is StatLine?. StatLine is online output database of Statistics Netherlands. Primary output channel Contains all published data Current size: 1500 data cubes, 2 billion data cells, over 150 million facts
E N D
StatLine 4 metadata implementation Edwin de Jonge Statistics Netherlands
What is StatLine? • StatLine is online output database of Statistics Netherlands. • Primary output channel • Contains all published data • Current size: 1500 data cubes, 2 billion data cells, over 150 million facts • Contains much functionality, including very good search engine
StatLine in Bussiness Architecture • StatLine in statistical process
What is StatLine 4? • Redesign current StatLine 3 dissemination software: • Reasons redesign: • Improve coherence • Changing publication policy • Handle time dependence • Archiving • Many new features
StatLine coherence • Ideally: StatLine coherent & consistent • Currently (StatLine 3): • 1500 independent data cubes • StatLine 4: • Data cubes share metadata: • centrally moderated, quality improvement • Data cubes share data: • Each fact stored once.
StatLine 4 metadata management • Metadata management centralized: • What? Conceptual metadata: • Classifications • Variables • By whom? Two organization units: • Coordination: Maintaining structure and meaning of classifications • Dissemination: Textual editing and translations • Data producers own data, but not meta data. • Result: Every fact in StatLine 4 uses central classifications.
StatLine in Bussiness Architecture • StatLine in statistical process
Classification status • In StatLine 4 each classification has status: • (Inter)national standard • Coordinated • within Statistics Netherlands • Shared • Shared but not coordinated • Private • Can only be used by 1 data cube • Only during conversion • This status is used for coordination purposes.
Cristal model: • StatLine 4 uses Cristal model • Model for classifications and variables (Van Bracht et al.) • Focus on Conceptual and Value domain (ISO 11179) • Model elements: • Category (value): • value of variable, creates subpopulation. e.g.: male (gender: male) • Can be part of other category (partial order) • Level: • set of disjoint categories • Equals “flat” classification
Cristal model (2): • Hierarchy: • Sequence of levels (total order) with contained categories • Every category in hierarchy has 1 parent in higher level • Equals “hierarchical” classification • Classification: • set of hierarchies with contained levels and categories • Equals: Family of hierarchical classifications.
Cristal model (3) • Classification versioning • Each metadata object has lifetime (begin and end date) • Each metadata object can have a predecessor and successor • Models versions of categories, levels and hierarchies.
Cristal model (4) • Multilingual • All textual properties are multilingual • E.g. Mannelijk (dutch) -> Male • All metadata and tables can be shown in each defined language • All textual properties have popular versions • E.g. Consumer Price Index -> Inflation • All metadata and tables can be shown in “popular” or “expert” mode • Object class: • Is stored, but not coordinated (yet)
StatLine 4 conversion • All content current StatLine must be converted • From 1500 independent cubes • To 1500 coordinated cubes • Conversion means coordination! • Total coordination -> very long conversion • No coordination -> no added value • Ergo: Partial classification coordination
Conversion strategy (1) • Strategy: • Coordinate standardized metadata • Allow non standards for 2 year period • Phased conversion • Preparation, conversion, coordination
Conversion strategy (2) • Preparation phase: until June 2006 • Collect and store standard classifications • E.g. Time, Region (50 versions), Age, Marital status, Sex, NACE • Including variations (disclosure control) • For each data cube • Check usage standard classifications • Non standard is marked “private” • Define StatLine 4 structure
Conversion strategy (3) • Conversion phase: (June 2006) • Convert data cube • Add missing meta data to metadata server • Check conversion • Coordination phase (November 2006) • After conversion: StatLine 4 contains coordinated and private metadata • In two years time all private metadata must be replaced with coordinated metadata
Benefits metadata StatLine 4 • Coordinated classifications and variables • Uniform naming and description • Standard/coordinated metadata can be downloaded • Better comparability of data • Better search results
Future improvements • StatLine 4.1 • Centralize population (object class) management: • E.g.: person, enterprise • Model populations and subpopulations • Statistical process • Centralize: • process metadata • quality metadata.