90 likes | 226 Views
Statistical data editing near the source using cloud computing concepts George Pongas, Christine Wirtz -Eurostat MSIS 2011 – 23-25 May 2011, Luxembourg. Editing near the source. Accelerates speed of final delivery to users and institutions Checks and imputations are near the respondent
E N D
Statistical data editing near the source using cloud computing conceptsGeorge Pongas, Christine Wirtz -EurostatMSIS 2011 – 23-25 May 2011, Luxembourg MSIS 2011 – 23-25 May 2011, Luxembourg
Editing near the source • Accelerates speed of final delivery to users and institutions • Checks and imputations are near the respondent • Data knowledge is frequently more profound in the primary collector institutions • Logical proximity is better than physical: Data and application sharing
Cloud and SOA in few Lines • Separates ownership and usage of data storage computer power and application development and execution (cloud) • Cloud variants are IaaS, PaaS, SaaS • Cloud architectures are: • Public • Private • Mixed • Community • Based on web technologies and independent software components to interlink on demand (SOA)
Data Editing in Eurostat • High volume of arrivals (>60.000 per year) • Format heterogeneity • Data checking absorbs substantial volume of human resources • Erroneous data imply communications with MS • Eurostat as a rule does not Impute… • Interest to have a Common distributed solutions
Eurostat’s web enabled system for editing(Editing building block (Ebb) • Completely Metadata Driven • Exists in 2 versions: • PC version • Web-based version • Technologies used: • ANTLR • Java • Tomcat or Weblogic • Hibernate • Postgres or Oracle
Implementation Details EBB is written using a set of Web services of the following types: • Administration • Program • Job
EBB functionalities • Support of categorical, text and numeric variables • Separation of programmer and user interfaces • Conditional and unconditional rules • Multi-record rules • Deterministic imputation • Use of auxiliary data • File operations • Special functions (unicity, duplication checks ...) • Outliers (HB, Sigma Gap, Terror) • Input/output of data/metadata • Reporting
Usage until now • Embedded in SAS (for microdata editing) • To distribute to data providers as standalone version • FDI (foreign direct investments) • ITS (international trade in services) • SBS (structural business statistics) • CVTS (continuous vocational training survey), • AES (adult education survey)