120 likes | 248 Views
Statistical file systems and archive statistics – Experiences from Statistics Sweden. Contribution to the Nordbotten Seminar at the Nordic Statistical Meeting in Copenhagen 2010 Bo Sundgren bo.sundgren@gmail.com http://sites.google.com/site/bosundgren/
E N D
Statistical file systems and archive statistics –Experiences from Statistics Sweden Contribution to the Nordbotten Seminar at the Nordic Statistical Meeting in Copenhagen 2010 Bo Sundgren bo.sundgren@gmail.com http://sites.google.com/site/bosundgren/ ”Special thanks to Statistics Denmark for having sponsored my participation in this meeting”
Archive-statistical principles(not technology-dependent, by and large) • Reuse existing raw data from administrative and statistical sources – for statistical purposes • Continuous inflow of data (more or less) • Organise data in a systematic way: statistical file system, databases, data warehouse • Ad hoc production of statistics • Systematic descriptions and definitions of data: • data and table definition languages; Nordbotten (1967) • metadata; Sundgren (1973) • Standardised definitions and identifiers enabling flexible integration and combination of data: registers, classifications, standard variables • Generalised software
Benefits • Much lower costs for collection of raw data:Up to 99% cost reduction; Statistics Netherlands • Reduced response burden • Faster data collection • Faster and more flexible response to new demands • More coherent data and statistics: statistical systems • Potential for better quality – but some new quality problems have to be tackled, or maybe rather the same old problems in new shapes
The 1960’s • Seminal papers by Svein Nordbotten from 1960 (Helsinki) and onwards (1963, 1965, 1966, 1967a,b,c, 1968) • Enthusiastic interest by Ingvar Ohlsson and Lennart Fastbom, Director General and Deputy Director General of Statistics Sweden • I was recruited in 1968 and started to work with Christer Arvas, new head of a new unit created for this development • Svein Nordbotten and Börje Langefors, founder of information systems (informatics) as an academic discipline in Sweden
Seminal Nordbotten papers • Nordbotten, S. (1960). Elektronmaskinene og statistikkens utforming i årene framover, De Nordiske Statistikermøter i Helsingfors 1960, Helsinki 1961, pp.135-141. Available for free downloading from www.nordbotten.com. • Nordbotten, S. (1963). Automatic Editing of Individual Statistical Observations, Statistical Standards and Studies, No. 3, United Nations. Available for free downloading from www.nordbotten.com. • Nordbotten, S.(1965). The Efficiency of Automatic Detection and Correction of Errors in Individual Observations as Compared with other Means of Improving the Quality of Statistics, Proceedings from the 35th Session of the International Statistical Institute. Beograde 1965. Available for free downloading from www.nordbotten.com. • Nordbotten, S. (1966). A Statistical File system. Statistisk Tidskrift, Stockholm. Available for free downloading from www.nordbotten.com. • Nordbotten, S. (1967a). On Statistical File System II. Statistisk Tidskrift. Stockholm. Available for free downloading from www.nordbotten.com. • Nordbotten, S. (1967b). Automatic Files in Statistical Systems. Statistical Standards and Studies. Handbook No. 9. United Nations. N.Y. Available for free downloading from www.nordbotten.com. • Nordbotten, S. (1967c). Purposes, Problems and Ideas Related to Statistical File Systems. Proceedings from the 36th Session of the International Statistical Institue. Invited paper. Sydney. Available for free downloading from www.nordbotten.com. • Nordbotten, S. (1968). Konfidensiell behandling av data, informasjonsnytte og klassifisering av data, Statistisk Tidskrift, Nr. 5, Stockholm 1968. In Norwegian.
Nordbotten ”data space” and Langefors ”e-message” data space e-message <object, property, time> <<population, object instance>, <variable, value>, <time scale, time>> Cf also relational data model: time missing
ARKSY development projects 1968-1974 • TAB68, a non-procedural language for easy, fast, and flexible production of statistical tables • VARKAT, a metadata system for documentation of variables, classifications, and populations • ARKDABA, a microdatabase prototype • RSDB and TSDB, multidimensional macrodatabases based on the αβγτ-model and the metadata-driven software AXIS • On-going development of statistical (base) registers • Planning a reorganisation of Statistics Sweden based on archive-statistical principles (including a data warehouse) and input-thruput-output
Major obstacles • The privacy debate provoked by FoB70: development of microdatabases was stopped, resources redirected towards protection of statistical confidentiality • Internal resistance against documentation: protection of the information monopoly of survey owners • Internal resistance against the proposed new organisation based on a centralised data warehouse, separation of input from output, and dismantling of the traditional stovepipes
Reorientation after 1974 • Leaving the subject matter organisation as it was • Merging the programming centre and the database centre into a new systems department • Standardising data structures (flat files and relational databases) • Maximum use of generalised software, including AXIS and the TAB68 software family, interfacing standardised data • A model for systems development based on archive-statistical and infological principles • Metadata-driven systems: AXIS, the CONDUCTOR • SCBDOK (1991) • Steadily growing use of administrative data (97-99%) • Introduction of microcomputers (80’s) and the Internet (90’s)
2007: A new attempt to reorganise • 2006: Kjell Jansson new Director General • Focus on customers, processes, and architecture • The Lotta project • A new process department responsible for standardising the processes • A standardised architecture based on SOA • Customers, process owners, architects • Proposed outsourcing of most IT people (IT operations successfully outsourced already in the early 1990’s) • 2008: Kjell Jansson leaves Statistics Sweden
Some possible future developments with Svein Nordbotten • New data sources: the Internet • Participative design of statistical systems: tackling the problem of reconciling conflicting interests ”within” and between stakeholders in production of official statistics
References • Most papers referred to in my paper, even the oldest ones, are available for free downloading from: • www.nordbotten.com • http://sites.google.com/site/bosundgren/