160 likes | 358 Views
Presentation at the United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia 20 -23 September 2011 By KB Danso-Manu Data Processing Manager Ghana Statistical Service. Data storage, maintenance and security strategy: Ghana’s experience. Outline.
E N D
Presentation at the United Nations Regional Seminar on Census Data Archiving for Africa, Addis Ababa, Ethiopia 20 -23 September 2011 By KB Danso-Manu Data Processing Manager Ghana Statistical Service Data storage, maintenance and security strategy: Ghana’s experience
Outline • Ghana’s 2010 Census • Archiving Strategy & Anonymization • Data Storage • Data Security and Backup • Data Access Policy
The 2010 Census of Ghana • The 2010 Population and Housing Census was conducted with 26th September, 2010 as the reference point (Census Night). • Enumeration continued till December 2010 (mop-up). • Data processing activities for the Summary Sheets started in January 2011. • Provisional figures were released in February 2011, giving the population of Ghana as 24.2m, (24,223,431). • Males form 48.7% of the population, females 51.3%.
The 2010 Census of Ghana(cont’d) • Form preparation of the census questionnaires started in March 2011. • Production scanning of the census questionnaires started in July 2011. • The ICR/OCR scanning technology is being used to capture the census forms, with TeleForm as the form processing software. • Meridio Record Management System is being used to store the images of the census forms. • It is expected that the final results of the Ghana 2010 Census would be released by end of June 2012.
Archiving Strategy and Management • The GSS has setup the National Data Archive, which has adopted the Data Documentation Initiative (DDI) and the Dublin Core (DCMI) international metadata standards, since 2008. • Census micro data is archived by: • Adopting the International Household Survey Network (IHSN)’s standard procedures and recommendations for data archiving. • Data is anonymized by altering or suppressing variables which could potentially identify a respondent or establishment. • Some challenges faced are: • Unavailability of census related documents (questionnaires, manuals, codebook, etc.) at a centralized location. • Lack of consistent or harmonized definitions, categorization, classifications of variables among different censuses/surveys.
Typical Anonymized dataset Var-J: numbers serially the number of households in the district, (using the required # digits)
Our Storage Technology Mapping • Storage system that is virtualization environment aware. • Storage system that is application aware, can determine how data is being access. • Storage data protection, de-duplication and archive, looking at the quality of the duplication data by understanding the data. • Hardware support, we have three years hardware replacement support from our suppliers.
GSS Server And Storage Infrastructure • A two node Hyper-V cluster with eight virtual servers • for image capturing and archiving applications: • All physical servers in the cluster can access the EMC • storage system in the datacenter. • Two virtual networks in our datacenter, iSCSI SAN Network • dedicated for storage and server data traffic, the GSS LAN for user data processing data traffic. • The total storage capacity of the primary storage system is • 24TB RAW. • 16 TB Usable storage after redundancy and fault tolerant • configuration of the total 24TB RAW storage capacity. • 4TB is deployed for applications data storage for each • virtual machine • 12TB deployed on the SAN and mapped so that they appear • as local drives on the virtual content server to store capture • images and process data . • 4TB is been used for instant backup recovery purpose using • snapshot technologies. • 2TB External Drive used for offsite system state and • configurations
Data Security • The basic security issues include physical security (e.g. stolen laptops), internal security (e.g. file backups), external security (e.g. Internet security), and integrity (e.g. audit trails). • Should smart phones be banned from data centers? • In order to address some of these security issues, NSOs may adopt • record keeping mechanism, • passwords, encryption, off-site storage, firewalls, authorization, and authentication • or some other method to block unauthorized users from gaining access.
Backup and Disaster Recovery (DR) • Daily backup of our virtual machines and encrypted copy of the process export are kept on external drive that is kept at the datacenter of the Ministry of Finance. • The eGovernment Network Infrastructure will be used as our backbone network to link to remote branches. • In addition, plans are been made to use DR site of the eGovernment Datacenter in Kumasi, a city about 400km from the national capital.
Data Access Policy • The Ghana Statistical Service as a public institution has the obligation to promote data dissemination to facilitate governance and national development. • There are three levels of access to archived census or survey micro-data. • Public use files - free from the internet • Licensed datasets – signed agreement • Datasets only accessible on location • Information products Online are free - www.statsghana.gov.gh . • Published reports (hard copy) available at front desk/Information section at nominal price. • Customized tables attract a token fee.
Data Access Policy (cont’d) • For Raw Data from surveys/census: • Make formal request; • Fill agreement form www.statsghana.gov.gh • Pay processing fee; • Dataset picked up or delivered through mail/post. • Only 1% of census data is given out. • 100% of survey data may be requested by researchers.
References • www.statsghana.gov.gh\nada • Yaw Antwi-Adjei (Info Builders (Ghana) Limited) www.infobuildersgh.com • Frenck Gyamfi, CSS & Partners www.css-partners.com • Virtual Statistical System, www.virtualstatisticalsystem.org