270 likes | 383 Views
Data Management and Information Delivery (DMID) Case Study Sibongile Madonsela Matile Malimabe Bubele Vakalisa Statistics South Africa UNECE Workshop on the Common Metadata Framework Vienna, Austria, July 4-7, 2007. Introduction. Programme Providing Frame for Stats SA Projects
E N D
Data Management and Information Delivery (DMID) Case Study Sibongile Madonsela Matile Malimabe Bubele Vakalisa Statistics South Africa UNECE Workshop on the Common Metadata Framework Vienna, Austria, July 4-7,2007
Introduction • Programme Providing Frame for Stats SA Projects • Providing Relevant Statistical Information to meet user Needs • Enhancing the Quality of Products and Services • Developing and Promoting Statistical Coordination and Partnerships • Building Human Capacity This project is aimed at supporting the strategic theme “Enhancing the Quality of Products and Services”. Within the DMID, the metadata management system addresses this strategic theme • Overall Project Objective • Metadata management system forms part of the organisation’s broader objective to continuously improve the quality of its products • The Survey metadata tool consist of elements for providing the overall description of a statistical survey • The survey metadata component is fashioned along the lines of Statistics Canada’s Integrated Metadata Database (IMDB) Metastat
Organisation Chart Stats SA and current projects The Data Management and Information Delivery (DMID) project (magenta shaded box) is located within the Data Management and Technology Division The yellow shaded boxes indicate some of the ongoing projects that are concurrent with the DMID project.
Organisation Chart DMID project, including supplier’s resources • Prescient Business Technologies (PBT) - the supplier to the DMID project, developing the ESDMF System • ESDMF – End to end Statistical Data Management Facility • PM – Project Manager
DMID Project Structure at a high level • Standards Development and Implementation • Led by Chief Standards Officer • Develop policies, standards and procedures before components of the facility can be implemented, using the Standard Lifecycle • For Phase One, the policies for Data Quality and Metadata were implemented • For future phases, related policies will be developed • End to end Statistical Data Management Facility (ESDMF) • Led by Technical Lead/Project Manager • Uses policies developed by the standard team to generate requirements for the system by using software technologies to implement the system
Standards Life Cycle • Develop policies, standards and procedures before components of the facility can be implemented, using the Standard Lifecycle
Conceptual components of the ESDMF • End to end Statistical Data Management Facility (ESDMF) Uses policies developed by the standard team to generate requirements for the system by using software technologies to implement the system
Statistical Metadata in Each Phase of the Statistical Cycle • Description • Metadata is used during various stages of statistical production as essential input to production processes • The production processes in turn, produce metadata • Metadata is also important in documenting the trail of activities during the statistical production process • List of Metadata Groups (Categories of Metadata) • Survey Metadata (Dataset Metadata) • Used to describe, access and update dataset, data structures • Called survey rather than dataset metadata because some of the metadata, such as information about “the population which the data describe”, refer to the broader aspects of the survey, and not only the dataset • Definitional Metadata • Describes the concepts used in producing statistical data • These concepts are often encapsulated into measurement variables used to collect statistical data • Methodological Metadata • Relate to the procedures by which data are collected and processed. • These include Sampling, Collection methods, Editing processes, etc. • System Metadata • Refers to active metadata used to drive automated operations • Some of the examples are: file size, access methods to databases, etc. • Operational Metadata • Metadata arising from and summarising the results of implementing the procedures • Examples include: Respondent burden, Response rates, Edit failure rates, Costs and other quality and performance indicators, etc.
Detailed Process Model Scheme • Need • Understand the need for the required statistics, i.e., what the required statistics are going to be used for in concrete terms by their users. • Design • Preparing ground for the execution of a statistical production project. • For example, questionnaire design, Capturing tool design, Tabulation plans, etc. • Build • The build phase puts together all the pieces of the infrastructure for a statistical production project • E.g. the data capturing and scanning tools are developed, tested and implemented • Collect • Refers to both direct and administrative methods of data collection • The direct collection method refers to data collection in which Stats SA sources data directly from the respondents • In administrative collection, data are drawn from databases of other organizations which in turn source them from their respondents • Process • Includes capturing collected data into databases so that data processing may be done • Analyse • After data have been cleaned during the Process phase, it is now ready for manipulation using analytical tools • Disseminate • Publications are created from the datasets produced by the analysis phase • Disseminated in various forms, e.g. electronic, printed output and compact disks
How the Stats SA process model map to the METIS Metadata cycles • Post Survey Evaluation This is currently done outside the statistical cycle. It is performed only for the large surveys such as the population census and the community survey
How Metadata Fits into Other Stats SA Systems • ESDMF Core (Metadata Subsystem is one part thereof) • This consists of all the components that make up the ESDMF system (only a few shown) • Integration Layer • This facilitates access to the ESDMF functionality to the existing systems with Stats SA
Metadata Description • SurveyMetadata Capture Tool • This is the principal mechanism by which metadata is documented • The tool is based on an approved Survey Metadata Standard Template • The standard is made up of groups of metadata elements • e.g. Overview, Generic Information, Methodology, Data Quality Report, Documentation and Contacts) which describe certain characteristics about the survey data • Survey • Metadata is organized around an entity known as the survey • A survey can be: • a direct survey: data is collected directly from the respondents. This could be a sample or a census • administrative: data is sourced from another organization, which had collected the data for their own purposes. • derived: a statistical program uses administrative data or a data integration activity is done • Series Metadata • The group of metadata topics that remains constant for a period of time or do not change as frequently (e.g. history, objective or abstract of the survey etc.) • Instance Metadata • This is the metadata that is compiled and produced frequently • The frequency of production of this metadata set is every time a release is produced • Storage • Metadata captured is stored in a database and can be saved and viewed anytime • Benefits • Once captured, users can always access their metadata from a centalised storage location (centralisation of metadata) • Users across the organisation also use the same mechanism to capture metadata (quality of metadata)
Survey Metadata Capture Tool The implemented Survey Metadata Capture Tool of the ESDMF captures the following metadata: • Overview • The Overview section comprises the following items: Objective, Abstract, History, etc. • Generic Information • provides generic information about the survey time frames, e.g. frequency, collection start and end dates • Primary Data Source • External data inputs to the survey, e.g. external or internal data sources • Methodology • The activities conducted and the methods and processes used which are specific to the survey, e.g. survey population, instrument design, sample design, etc. • Data Quality Report • Comprises the quality dimensions of the data, e.g. relevance, accuracy, accessibility, etc. • Documentation • Links to additional documentation related to the survey • Contact • Contact person who will manage enquiries related to the data or information produced by the survey • Active Metadata Sets • The file identifier and status of the current/active metadata set is displayed immediately under this section. In other words, the metadata set that the user is currently capturing, editing or viewing. • Loaded Metadata Sets • Lists the file identifiers and statuses of metadata sets created by the current user • Enables the current user to switch between metadata sets
Survey Metadata Capture Tool User Interface – Activity Selection • Instance Metadata – Create an instance metadata • Create a Report – Create a report in PDF format • View (metadata) – View approved metadata • Approve (metadata) – Approve metadata for use in a survey • Series Metadata – Create a series metadata
Survey Metadata Capture Tool User Interface – Navigation • Active Metadata Sets • Overview • Generic Information • Primary Data Source • Methodology • Data Quality Report • Documentation • Contact • Loaded Metadata Sets
IT Infrastructure Specifications • Operating Systems • Desktops are in Microsoft Windows • The application is deployed in an Open Source operating system (Novell SuSe Linux) • Networks • The network architecture is based on open protocols and industry standards • Allows remote access to some employees • Supports both local area (LAN) and wide area (WAN) networks • Servers • The system is developed as a client-server application • This means that there is a need for powerful computer servers capable of handling intensive processing • Data Storage • Storage management is via the Storage Area Network (SAN) • Environments • Three environments: • Application Development • User Acceptance Testing (UAT) • Production
IT Infrastructure Specifications - Details
IT Infrastructure Specifications - Diagrams
Components of Metadata Management Application • User Interface • The user interfaces for all the metadata management system applications is web-based • Client workstations only need to have a web-browser to access server based applications • The main supported web-browsers are Microsoft Internet Explorer and Firefox • Database • The application is supported by a relational database management system (RDBMS) • The RDBMS engine of choice for this project is Sybase • The project is currently using the open source RDBMS, MySQL • Business Logic • The business logic controlling the interaction between the UI and the underlying database is coded using Java server side scripting • There is also business logic coded using stored procedures. This mostly performs housekeeping within the database • Application/Web Server • The application is served to the client via Tomcat, which processes Java code. • Tomcat also handles HTTP calls from the web browser
Partnerships and Cooperation between Agencies • Latvia • Metadata model is also based on Bo Sundgren’s model • Their outsourced supplier took a while to understand the business of the statistical organization • Ireland • Issues regarding communication between the customer and the supplier • Project took longer than planned • Slovenia • Metadata model is also based on Bo Sundgren’s model, with some modifications • New Zealand • Adopted their business process model, called it Statistical Value Chain in Stats SA • We also adopted their broke down of metadata into five categories • Australia • For a successful data warehouse project, there is a need to develop policies and standards • Sweden • Advise us on various aspects of metadata and statistical production processes • Better idea on how to develop a data quality template, as well as how data quality should be reported on • Canada • We applied that knowledge of their Metastat (IMDB) during the development of our Survey Metadata Capturing Tool. • Consultants from Canada come to help in other projects within Stats SA, including us. • United States • We used the Corporate Metadata Repository (CMR) model by Dan Gillman, from the US Bureau of Statistics in our understanding the metadata model
Organizational and Cultural Issues • Climate and Culture Assessment • A key challenge to Stats SA is to focus the organisation on the strategic importance of the DMID project • A Climate and Culture Assessment was done by holding focus groups as well as running an online survey via Stats SA intranet website • Change Readiness Assessment • A Change Readiness Assessment was conducted to determine the current capacity of Stats SA to change, and to identify areas of resistance towards DMID requiring Organisation Change Management (OCM) interventions • The following ‘change readiness dimensions’ formed the basis of the Change Readiness Assessment: • Clear vision • Effective leadership • Positive experience with past change initiatives • Motivation to do the project • Effective communication • Adequate project team resources • What is Change Readiness? • The Change Readiness Assessment is a process used to determine the levels of understanding, acceptance and commitment likely to affect the success of the planned change.
Change Commitment Curve • Commitment • As the DMID project phases roll out, different stakeholders will need to be at specific levels of commitment • The level of commitment required will be dependent on the role they play in the DMID project and their ability to influence the program • Framework • The Change Commitment Curve will provide a framework for understanding and tracking the requisite levels of commitment that stakeholders need to be facilitated through so that OCM interventions can be developed accordingly
Climate and Culture plus Change Readiness Assessments • The following we the findings from the assessments: • Executive Management does not have the same understanding of the DMID project • Lack of communication between management and sub-ordinates; this makes it difficult for sub-ordinates to understand the purpose of the project and the impact it has on their working lives • Lack of support from Executive management will result to resistance and difficult success of the project • If management does not communicate, does not understand and does not promote the project, it will result in difficulty to deliver the message and get buy-in from staff in the organisation • Next Steps from the Findings • The findings of the assessments resulted in identifying where some of the key staff members belonged on the Change Commitment Curve. • In general, most were in the “Setting the Scene” and “Achieving Acceptance” area bounded by in time by “Contact” (“I know something is changing”) and “Understanding” (“I know the implications for me”) • Obviously, a lot of effort is needed in order to move from that area to “Achieving Commitment” demonstrated by “Internalisation” wherein staff can claim that “This is the way I do things” • Another outcome of these assessments was to organize a Leadership Alignment workshop. • In this workshop, the Executive Committee was given a presentation of the findings and the path forward • The path forward is to ensure that the leadership understands the goals of the project and how they line up with the vision of Stats SA • The leadership was also instructed on how to communicate the same message about the project
Lessons Learned • The business of Stats SA • The supplier had a difficult time understanding the business of Stats SA, which is statistical production processes • Skills Transfer Plan • Under pressure of meeting the deliverables, the supplier ignored the Skills Transfer Plan, with the result that the Stats SA developers were not involved in the final design and development of the phase 1 deliverable. • Breakdown of deliverables • Each phase was planned to be three months long in duration. Also, each phase was planned to be a complete deliverable in its own right, even though the next phase was planned to build on the previous phases. • The first phase was delivered late mainly due to the lack of understanding that the supplier demonstrated. • The first deliverable did not meet the stated business objectives of data quality initially
END Thank You Contact information: Sibongile Madonsela (SibongileMA@statssa.gov.za) Matile Malimabe (MatileM@statssa.gov.za) Bubele Vakalisa (BubeleV@statssa.gov.za) Ashwell Jenneker (ashwellj@statssa.gov.za) all from Statistics South Africa (www.statssa.gov.za)