160 likes | 187 Views
Meeting on the Management of Statistical Information Systems (MSIS 2012). The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT. Washington DC - May 21-23, 2012. The Census Web-based Information System.
E N D
Meeting on the Management of Statistical Information Systems(MSIS 2012) The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT Washington DC - May 21-23, 2012
The Census Web-based Information System • SGR: the Census management system • assignment of households to enumerators • monitoring of collection activities, particularly of questionnaires collected in the various possible ways (online, munic. collection centers, post offices, enumerators) • visualization of some key indicators (a kind of data warehouse on the collection process) • Census to Local Population Registriescomparison and re-alignment • … • RETE: the online documentation for operators • QPOP: the online questionnaire • the main topic of this presentation... Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
QPOP: the main requirements • To be used by both citizens (self-compilation) and operators (online data entry) • tight integration with the SGR Census Management System, in particular with its workflow • Easy to use, fast and scalable • Assisting users in following the correct compilation rules (without bothering them) • Multi-language (Italian, German and Slovenian) • Immediate coding of open questions (textual in the paper version) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
QPOP: the main requirements • To be used by both citizens (self-compilation) and operators (online data entry) • tight integration with the SGR Census Management System, in particular with its workflow • Easy to use, fast and scalable • Assisting users in following the correct compilation rules (without bothering them) • Multi-language (Italian, German and Slovenian) • Immediate coding of open questions (textual in the paper version) Almost impossible re-using already available applications Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
GUI Actions Services DAOs Entities Struts2 Spring Hibernate The application design • GUI: JSP pages implementing the graphical user interface. They can be forms for sending data to the server, processed by an action, and/or results of an action execution; • Actions: Java classes whose execution is triggered by a HTTP call, activated by a form submission on the GUI. They receive data from the HTTP request and execute some server-side processing by calling Services; • Services: Java classes that implement database transactions, realized through sequences of calls to DAOs; • Data Access Objects (DAOs): Java classes that implement so-called CRUD (Create-Read-Update-Delete) database operations related to one or more domain objects; • Entities: Java classes representing records of one database table. Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
A metadata-driven application • The leading principle: write more metadata, write less (more generalized) programming code • Metadata to specify the type (single choice, multi-response, textual input, data, etc.) of a question • Questions sharing the same type are handled by the same pieces of Java code (templates) • The whole processing chain from HTML forms down to DB records (and viceversa) is automatically handled • Metadata to specify (multi-language) texts in all GUI fragments BUT ALSO • Metadata to specify question routing • Based on the concept of Questionnaire Graph Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
The Questionnaire Graph (QG) • The basic idea: formally modeling the structure of the questionnaire and the correct set and sequence of questions to be filled in by respondents • A Questionnaire Graph (QG) in QPOP is a Directed Acyclic Graph (DAG), such that: • Nodes Ni are in 1-1 correspondence with each questionnaire fragment (mainly questions, but not only); • Node types correspond to templates (which in turn determine appearance and behavior) • Edge labels represent conditions on questions (e.g. “Has the respondent checked the option 2 of question X?”). • A (directed) labeled edge from node (question) Ni to node (question) Nj corresponds to the fact that the user has to respond to question Nj after having given a response to node Ni, if the condition expressed on the edge label is true • QG is used by the application (both on client and server side) to enable and disable questions on the web page and to validate the user’s input before saving the user’s answers in the microdata tables, i.e. to enforce consistency Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
From the questionnaire to the QG Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
En(Dis-)abling questions by updating QG node states (1) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
En(Dis-)abling questions by updating QG node states (2) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
En(Dis-)abling questions by updating QG node states (3) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
The search engine for assisted coding ? M A T H E A M T I C S D E G R E E 72001003 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
Reference dictionary pre-processing • Character normalization accented letters are replaced with the corresponding unaccented version, uppercase letters with lowercase ones, other characters like punctuation marks are removed • Stopword removal “useless” words are removed from the character-normalized version of the items, produced in the previous step. Both“general” (like conjunctions, articles, etc.) and “context-specific” stopwords (e.g. the word “degree”, when considering a list of academic degrees) are removed • Search terms extraction and weighting the single terms (words) constituting the normalized items produced by the previous two steps are extracted and stored in the search engine DB tables. A weight is also assigned to each term, depending on its relative frequency inside the dictionary Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
Search string processing • Similarity search each (normalized) term to be searched is compared with those in the database; the terms that produce a similarity above a given (relatively high) threshold are passed to the following step • Extraction of the dictionary items the dictionary items containing one or more terms obtained in the previous step are extracted from the DB. At the same time, for each item extracted, some values are either read or computed, which will be used in the following step • Dictionary item sorting by using the values extracted/computed in the previous step, the score of each item in the result set is computed and the list is sorted accordingly in descending order. This sorted list is proposed to the respondent Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
QPOP in (a few) figures Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012
Future (current) work • Questionnaires for the Industry and Services Census comprising: • Businesses (2 “fairly similar” questionnaires implemented as one with “special” routing conditions) • Non-profit institutions • More general question templates • More general checks and routing conditions (support for existential and universal quantifications, as well as counting) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012