1 / 16

The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and

Meeting on the Management of Statistical Information Systems (MSIS 2012). The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT. Washington DC - May 21-23, 2012. The Census Web-based Information System.

marlow
Download Presentation

The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meeting on the Management of Statistical Information Systems(MSIS 2012) The Web-based Data Collection in the Italian Population and Housing Census Leonardo Tininini and Antonino Virgillito ISTAT Washington DC - May 21-23, 2012

  2. The Census Web-based Information System • SGR: the Census management system • assignment of households to enumerators • monitoring of collection activities, particularly of questionnaires collected in the various possible ways (online, munic. collection centers, post offices, enumerators) • visualization of some key indicators (a kind of data warehouse on the collection process) • Census to Local Population Registriescomparison and re-alignment • … • RETE: the online documentation for operators • QPOP: the online questionnaire • the main topic of this presentation... Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  3. QPOP: the main requirements • To be used by both citizens (self-compilation) and operators (online data entry) • tight integration with the SGR Census Management System, in particular with its workflow • Easy to use, fast and scalable • Assisting users in following the correct compilation rules (without bothering them) • Multi-language (Italian, German and Slovenian) • Immediate coding of open questions (textual in the paper version) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  4. QPOP: the main requirements • To be used by both citizens (self-compilation) and operators (online data entry) • tight integration with the SGR Census Management System, in particular with its workflow • Easy to use, fast and scalable • Assisting users in following the correct compilation rules (without bothering them) • Multi-language (Italian, German and Slovenian) • Immediate coding of open questions (textual in the paper version) Almost impossible re-using already available applications Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  5. GUI Actions Services DAOs Entities Struts2 Spring Hibernate The application design • GUI: JSP pages implementing the graphical user interface. They can be forms for sending data to the server, processed by an action, and/or results of an action execution; • Actions: Java classes whose execution is triggered by a HTTP call, activated by a form submission on the GUI. They receive data from the HTTP request and execute some server-side processing by calling Services; • Services: Java classes that implement database transactions, realized through sequences of calls to DAOs; • Data Access Objects (DAOs): Java classes that implement so-called CRUD (Create-Read-Update-Delete) database operations related to one or more domain objects; • Entities: Java classes representing records of one database table. Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  6. A metadata-driven application • The leading principle: write more metadata, write less (more generalized) programming code • Metadata to specify the type (single choice, multi-response, textual input, data, etc.) of a question • Questions sharing the same type are handled by the same pieces of Java code (templates) • The whole processing chain from HTML forms down to DB records (and viceversa) is automatically handled • Metadata to specify (multi-language) texts in all GUI fragments BUT ALSO • Metadata to specify question routing • Based on the concept of Questionnaire Graph Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  7. The Questionnaire Graph (QG) • The basic idea: formally modeling the structure of the questionnaire and the correct set and sequence of questions to be filled in by respondents • A Questionnaire Graph (QG) in QPOP is a Directed Acyclic Graph (DAG), such that: • Nodes Ni are in 1-1 correspondence with each questionnaire fragment (mainly questions, but not only); • Node types correspond to templates (which in turn determine appearance and behavior) • Edge labels represent conditions on questions (e.g. “Has the respondent checked the option 2 of question X?”). • A (directed) labeled edge from node (question) Ni to node (question) Nj corresponds to the fact that the user has to respond to question Nj after having given a response to node Ni, if the condition expressed on the edge label is true • QG is used by the application (both on client and server side) to enable and disable questions on the web page and to validate the user’s input before saving the user’s answers in the microdata tables, i.e. to enforce consistency Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  8. From the questionnaire to the QG Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  9. En(Dis-)abling questions by updating QG node states (1) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  10. En(Dis-)abling questions by updating QG node states (2) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  11. En(Dis-)abling questions by updating QG node states (3) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  12. The search engine for assisted coding ? M A T H E A M T I C S D E G R E E 72001003 Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  13. Reference dictionary pre-processing • Character normalization accented letters are replaced with the corresponding unaccented version, uppercase letters with lowercase ones, other characters like punctuation marks are removed • Stopword removal “useless” words are removed from the character-normalized version of the items, produced in the previous step. Both“general” (like conjunctions, articles, etc.) and “context-specific” stopwords (e.g. the word “degree”, when considering a list of academic degrees) are removed • Search terms extraction and weighting the single terms (words) constituting the normalized items produced by the previous two steps are extracted and stored in the search engine DB tables. A weight is also assigned to each term, depending on its relative frequency inside the dictionary Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  14. Search string processing • Similarity search each (normalized) term to be searched is compared with those in the database; the terms that produce a similarity above a given (relatively high) threshold are passed to the following step • Extraction of the dictionary items the dictionary items containing one or more terms obtained in the previous step are extracted from the DB. At the same time, for each item extracted, some values are either read or computed, which will be used in the following step • Dictionary item sorting by using the values extracted/computed in the previous step, the score of each item in the result set is computed and the list is sorted accordingly in descending order. This sorted list is proposed to the respondent Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  15. QPOP in (a few) figures Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

  16. Future (current) work • Questionnaires for the Industry and Services Census comprising: • Businesses (2 “fairly similar” questionnaires implemented as one with “special” routing conditions) • Non-profit institutions • More general question templates • More general checks and routing conditions (support for existential and universal quantifications, as well as counting) Tininini and Virgillito - The Web-based Data Collection in the Italian Population and Housing Census - MSIS 2012

More Related