520 likes | 524 Views
Explore the importance of classifications in harmonization processes within statistical metadata systems, including differences between classifications and code lists, and guidelines for harmonization. Learn how to distinguish between standard classifications and code lists.
E N D
« CLASSIFICATIONS – a key element in the process of harmonization « Isabel Valente(isabel.valente@ine.pt) Statistics Portugal/Metadata Unit Work Session on Quality management systems (Q2010) Helsinki– 3 – 6 May, 2010 «
Fig.1Macro Architecture of the Statistical Metadata System1 1 In, Morgado, Isabel, “Metadata and survey documentation Portuguese NSI experience”, European Conference on Quality and Methodology in Official Statistics (Q2004), 24-26, May, 2004, Mainz-Germany.
Integrated System of Statistical Classifications (SINE) conceptual model developed by the Neuchâtel group
2002-2004 - development of the consultation application - replacement of the existing information on classifications in the Portal 2004 – 2005 - enlargement of the information made available - begin the gradual incorporation of code lists - start the development of the management application SINE main phases
2006-2007 - consolidation of the management application - small adjustments' and improvements in the consultation application Current phase (2008) - consolidation and improvement of the existing model - of harmonization of the existing information SINE main phases
SINE main purposes • be a reference base about national, communitarian and international classifications for statistical ends • be a reference instrument for the classifications management 3. be an instrument for the harmonization and coordination of classifications
SINE structure Family Classification Version Level Item
Classifications • Code lists for observation • Code lists for dissemination
What’s the difference between a classification and a code list?
General ideas Classifications Code lists less conceptual don’t have a formal base simple structures small dimension could or not have a system of codification don’t have formalized rules about revisions and changes are not based over the idea of version operational lists of internal use of the institution • more conceptual • have a formal base • complex structures • big dimension • system of codification • formalized rules about revisions and changes • versions are defined
Marital status Sex Ranks of turnover Size classes of persons employed Degree of relationship with the representative of the household
What to do? Should those cases be considered classifications or code lists?
Classifications structures which have for base Communitarian or national regulations Methodological manuals Communitarian or international recommendations Reference structures
Consequence The remaining structures (code lists), whenever possible, where approach to those structures Problem encountered Access to the code lists for the dissemination of data in 1st place Access to the classification structure which is part of a recommendation or regulation in 2nd place
Another problem How to distinguish between standard classifications or reference structures from those code lists?
Solution Trying to find distinctive elements in the versions names Norms for the writing of names (naming convention)
Constitutent elements of the name version General form Main part [+ “,”+formal qualifier] [“+” (“+ informal qualifier +”)”] [+ “-“+ variant n] Qualifier Examples: • Nomenclature of territorial units for statistics, 2002 version • International standard classification of education, 1997 (levels of education) • Types of dwellings (4) Specific form: variant The variant is always the last part of the name and is formed by: “–”+ word “variant” + “variant” number Examples: • CAE Rev.2 (sections C to E) – variant 1 • Classes of net monthly wages (IEFA, €) - variant 1
Rules for the writing of names reference structures keep the original and official name keep the word “nomenclature" or “classification” in the name Informal qualifiers are added to distinguish national classifications from communitarian ones. code lists could or not keep the original name couldn't have the word “nomenclature" or “classification in the name informal qualifiers are added to distinguish the code lists if variants of a reference structure they keep the name or acronym of that classification the names should be general
Another problem Lack of harmonization in the writing form of classifications and code lists as also in its contents
Harmonization of the names of classifications versions items labels
Names are initiated by a capital letter, followed by small caps. Exception to that: acronyms, names or words that followed an end point. examples: V00011 - Statistical classification of products by activity in the European Economic Community, 2002 version V00021 -International standard industrial classification of all economic activities, revision 3.1 Internal rules to SINE for the writing of classifications and versions names
The names of code lists should use the plural form example: V01610 - Types of primary and lower secondary education Code lists derived from a standard classification have to keep in its own name the acronym or name of the standard classification examples: V01675 - CAE Rev. 3 (total, sections C to N) - variant 2 V01717 - CPA 2008 (legal services) - variant 7 Internal rules to SINE for the writing of classifications and versions names
Those code lists have to include the word variant in its name example: V02023 - Activity status (IEFA) - variant 4 Cumulative structureshave to include in its name the expression “cumulative” example: V02069 - Countries (cumulative - air transport companies) Internal rules to SINE for the writing of classifications and versions names
Internal rules to SINE for the writing of classifications and versions names • The items labels should be in its extensive form. Abbreviations should be avoided. Exception to that: acronyms or names. • Items labels are initiated by a capital letter, followed by small caps.
People give different names to the same things according with the perspective that is followed We should harmonize the expressions used avoiding to name the same things in a differently way Problems with the names
Problems with the names Type of rail freight traffic Type of movement in port Type of traffic on the enterprise Types of flow
However when we have too many versions of the same classification we need elements to distinguish between them. Problems with the names
2. Harmonization of contents How to do that?
compulsory harmonization of codes and labels of the items according with the Norm ISO alpha 2. the names of countries in Portuguese must be in accordance with the version approved by the Statistical Council. groupings of countries used in code lists had been centrally created and managed in order to establish a consistent and harmonized base of reference for this end. codes are always independent of the used language so they remain unchangeable in translations. Lists of countries
Activities or products code lists • code lists derived from standard classifications had to keep codes and labels equal to those ones when equal. • if different should have different codes and labels. • for the aggregation of consecutive categories, codes are connected by a hyphen (i.e.: C-D). • for the aggregation of non-consecutive categories connection is done by the particle “+” (i.e.: A+C).
Other code lists • In code lists that integrate the same classification and without a standard classification for reference is tried to find the structure that is more including. • Once found that structure it passes to be the reference structure. New code lists that appear are approached to that structure.
Other code lists • For other code lists where it is not possibleto find a standard andin which the categories little varied is promoted to keep unchangeable the codes and labels for the categories that where kept unchangeable.
Other code lists • Use in code lists of certain codes for certain situations • total codified with T • residual values preferential with 9, or finished in 9 • promoted the use of codes and labels of structures already inserted in SINE in detriment of new codifications and formularizations.
ONU, Standard international age classification five-year and ten year age groups, with the boundaries generally beginning at multiples of five and ten and ending at four and nine ages separated by a hyphen, preceded and followed by a space, thus simplifying the use of particles and becoming them more generalist Age groups
consecutive classes should be explicitly clear, sothey should not repeat equal values in different classes in all items should be explicit what is the target of quantification (i.e.: years, euro, person, etc.). minimum and maximum thresholds should use normalized expressions: In the lower class “Less than” (i.e.: Less than 30 years). In the higher class “and more” following the last value immediately used (i.e.: 65 and more years). The signals “<”, “>”, “≤” and” ≥” should not be used Other size classes
numerical values higher than the thousand have to be separated by a space in order to make the reading between hundreds, thousands, tens of thousands, millions, etc., easier (10 000 000) or alternatively be adopt in its substitution powers of 10 (106) Other size classes
SINE gave to know what exist about classifications widened the term to code lists make classifications structures available: in a normalized format in an easy way at any time in accordance with the users needs Conclusions
Because of that it was possible: the detection and correction of errors of writing harmonization in the form of writing of codes and labels to implement some harmonization procedures and rules to improve the clarity and the precision of the terms used to improve the integration between code lists and standard classifications harmonization of codes and labels between code lists reduction of the number of code lists needed by the creation of generic and transversal structures Time profits Bigger integration between the different metadata subsystems Conclusions