470 likes | 596 Views
SIMPII – Workshop on Information Technology. Day 3 Methodology system support unit and writing specifications. Statistics Canada. November 30 th 2011. Objectives. This module will talk about a very specific type of system development, the systems to perform statistical functions
E N D
SIMPII – Workshop on Information Technology Day 3 Methodology system support unit and writing specifications Statistics Canada November 30th 2011
Objectives This module will talk about a very specific type of system development, the systems to perform statistical functions • It is the Methodologists who are experts in mathematical statistics • The Methodologists work with the Subject Matter client’s data to determine which are the best methods and algorithms to use • Therefore in the development process for these functions, the needs of the clients are represented by methodologists • This module describes the development process for systems to perform statistical functions, and the roles of Methodologists and System Engineers Statistics Canada • Statistique Canada
Outline • Overview of generalized systems to perform statistical functions at Statistics Canada • Guiding principles • Roles and responsibilities • The development process • Support activities and tools Statistics Canada • Statistique Canada
Context • The focus of this presentation is on systems built specifically to perform complex statistical operations, including but not limited to the following: • Probabilistic matching, sample allocation, coordinated sampling, nearest neighbour imputation, calibration weighting, design based variance estimation, time series and benchmarking, disclosure avoidance • These are a “special case” of generalized systems • The development process is very similar but there are a few differences because of the nature of the systems Statistics Canada • Statistique Canada
Statistics Canada’sGeneralized Systems SAS basedusing statistical methods GSAM – sampling Banff – imputation GES – weighting & estimation G-Series – time series G-Confid – disclosure control Statistics Canada • Statistique Canada Not SAS based LogiPlus – editing G-Code – auto coding G-Link – record linkageUnder Development G-Tab – tabulation G-Export – dissemination G-Sam – sampling G-Est – weighting & estimation
At Statistics Canada … • The development of statistical generalized systems cannot be achieved in a year or two • It is rather the result of decades of effort which starts with theoretical research, feasibility studies, and the development of prototypes that are built for specific surveys • Next several years of fine tuning and evaluation are required before one can think about a generalized version • Even then, not all prototypes will become generalized systems Statistics Canada • Statistique Canada
At Statistics Canada … • We build our own statistical generalized systems, in house, because they do not already exist for purchase • Permanent staff, not contractors • Partnership between Methodology (experts in statistical methods) and Informatics (system engineers) • The client is Subject Matter, however the client’s interests are represented by Methodology • Takes approximately 5 years to build a new system, using a team of 2-3 methodologists and 2-3 system engineers Statistics Canada • Statistique Canada
Guiding Principles • Include only sound, well understood, defensible statistical methods • Build modules that perform individual statistical methods • Stratification module would only do stratification but would have several different stratification methods • Use a common, well supported foundation software (SAS or C# usually) • Build the systems in a flexible manner such that more modules can be added later • Build the modules in a flexible manner such that the user can alter the constraints and assumptions (parameter driven) • Use metadata (parameters) to “drive” the modules • Build modules that are appropriate for business and social surveys …
Business versus Social Surveys • The sampling methods for social surveys at Statistics Canada are often more complex than those for business surveys (for example, multi stage as well as multi phase) • We have had more success to date at building statistical generalized systems that meet the needs of business surveys • Looking forward, our goal is to also incorporate methods that meet the needs of social surveys Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada 2014-08-26 11
Development Roles • Statistical researchers • Research and develop methodology • Interpret the needs of the clients • Build prototypes • Write the business requirements • Methodology developers • Analyze the user requirements • Generalize the methodology • Write detailed specifications • System engineers • Analyze the specifications • Investigate implementation options • Determine the system architecture • Perform programming, document and maintain code Statistics Canada • Statistique Canada
Ongoing Responsibilities • Methodology • Marketing, communication with user community, training • Manage expectations of the user community • Find solutions to user problems that relate to methodology • Communicate additional needs to the researchers • Systems • Solve user problems that are related to the software itself • Enhance the software as requirements evolve • Upgrade the software as platforms and standards evolve • Ensure the architecture remains compatible with Statistics Canada’s enterprise architecture Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Developing the Methodology • One or more survey statisticians recognize the need for a particular methodology in a statistical application • The statistical researchers develop the idea, expressing it in terms of concepts, algorithms and algebraic expressions • Management give their support for the use of this methodology and therefore its development as a generalized module to serve global (across several programs) rather than local needs
How to choose which methods to include? Statistics Canada • Statistique Canada
Examples of How to Choose Methods • Some are obvious, for example stratified simple random sampling, design based variance estimation, donor imputation • Two phase sampling is quite common • Should we implement two phase sampling, or three phase, or “generalize” it to 4, or 5, or no limit? • The Labour Force Survey has a complex sample design, based partly on geography and partly on controlling response burden. Should we try to generalize this, or should it be a “one of a kind” system? Statistics Canada • Statistique Canada
More Examples … • Research continues in many domains, such as sample coordination and disclosure control. At what point do we consider a method to be sound enough and applicable to enough different survey programs that we should incorporate it in a generalized systems? • Subject matter clients use the generalized systems and often want to “push the limits”, asking for additional functionality and flexibility Statistics Canada • Statistique Canada
Project Management • Methodology developers consider the work done by statistical researchers and the needs of the subject matter clients, and recommend to Management which methods should be included • Often we compromise, balancing resources and demand Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Building a Prototype • One or more statistical researchers, working as a team, build a prototype, as proof that the concept will work • The prototype is tested to make sure it produces the expected results • Accurate results using real data • Acceptable performance under realistic conditions • Typical data • Extreme cases • The builders of the prototype document what it does, the theory behind it and how it works
Building a Prototype (continued) • In creating the prototype, some highly specialized and sophisticated modules of code are written. This code is also well documented, to help in later stages of the development • The prototype with its documentation is given to the methodology developers • Use the prototype in testing phase • Use the documentation in writing specifications Statistics Canada • Statistique Canada
User Interface • Also documented at this phase is how users will interact with the modules, what parameters are needed and what the inputs and outputs look like • Some users will be very comfortable using the modules as they are • Other users will need to be guided how to use the modules, through some sort of interface Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Writing Specifications • At this point the statistical researchers pass the methodology and the prototype to another team of one or more methodology developers who will write the specifications and are the link between the researchers and the system engineers • These methodologists understand how the prototype works and how it will be used; they understand the overall concepts and are skilled at writing detailed specifications • These specifications are more detailed, more mathematical in nature than business requirements
Deliverables include: • Business requirements (the big picture) • Description of all functionality envisaged, not just what is included in the current development phase • Indication of where additional functionality at a later date could be added • Description of all the parameters, inputs and outputs • Graphical representation of the interactions between the modules, and how the user will interact with the modules • Detailed specifications of the methodology itself; depending on the complexity, this could require more background information (for example, summations are easy, sparse matrix manipulation is more complicated)
Example of Business Requirements Estimation requirements • Description in words of the assumptions and methodology • No (or very few) formulas • Bullet form, some tables • Clarify what is needed, not how to do it • For example, “first phase sampling weight” Statistics Canada • Statistique Canada
Example of a Specification Benchmarking specification • Description in words of the inputs and outputs • Mathematical formulas as well as description in words of what manipulations to do on the inputs in order to get the desired outputs • Not pseudo-code • Clarify what to do in mathematical terms, not how to program it Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Analysis of Specifications • Methodologists give the specifications to the system engineers • Meet regularly to discuss, clarify, revise the specifications • This step is complete only when both groups agree on the meaning of the specifications Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Designing the System • Using the fully analyzed specifications, the system architecture is designed • The foundation software is chosen • Existing tools (utility subroutines, built-in functions in the foundation software) are identified • The most appropriate user interface technology is identified • The complexity is analyzed and a detailed schedule for programming and testing is created Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Programming the System • Build modules one at a time • Pass built and tested modules to Methodology developers as they are ready • Follow programming protocols and standards • Document the code • Dialogue between Methodology and System Engineers continues throughout this phase • Iterative process, continuous communication is extremely important Statistics Canada • Statistique Canada
Development Process Methodology System Engineers Statistics Canada • Statistique Canada
Unit and Functional Testing • The System Engineers test each module (unit testing) to ensure that it performs according to their interpretation of the specifications • The Methodology Developers test each module (functional testing) to ensure that it performs according to their expectations • Develop a set of test cases to exercise all possible scenarios • Use real data whenever possible • Explore what would happen in “normal” cases, also in extreme cases • Compare to the prototype • Explore what happens when the user makes a “typical” mistake (mis-specifies a parameter, missing inputs, …)
Integrated Testing • System Engineers integrate the new module into the statistical generalized system • Test that the connections perform as expected • Inputs and outputs • Metadata Statistics Canada • Statistique Canada
Beta Testing • Methodology developers give the module to a set of users for Beta testing • Experienced methodologists who anticipate using the modules when they are finalized • Test the modules in “real” environment • Provide feedback to methodology developers • Methodology developers relay the results of their own testing as well as the results of the Beta testing back to the system engineers
Acceptance Testing / Certification • Can cycle through the testing phase several times • The methodology developers certify the completeness of the module and it is formally released to the user community
Methodology Support Activities Statistics Canada • Statistique Canada
Marketing and communications • Maintain two-way communication with the user community throughout the entire development cycle • With support from Upper Management, promote the use of the Generalized Systems over other “in house” built systems • Give seminars to increase awareness of what modules are available and what they do • Provide training courses and workshops for hands-on experience
Respond to user requests for assistance and problem solving, related to the methodology • Establish and follow protocols governing which team member responds to what type of questions, how long at most it should take before providing an answer, etc. • Maintain a log of questions and answers • Becomes a reference guide • Facilitates speedy response • Change requests are initiated when the support team indicate that users are asking for additional functionality
Methodology support tools Statistics Canada • Statistique Canada
User guide • Intended audience is the end user (methodologist or subject matter) • Describes how to make the module work • Tutorial • Intended audience is the end user • Self-guided learning through examples • Methodology documentation • Intended audience is methodologists • Describes the statistical theory • More detailed than the user guide • Software demonstration • Combination of presentations and training exercises • Functionality of the module is demonstrated using data from the Tutorial Statistics Canada • Statistique Canada
Systems Support Activities • Respond to user requests for assistance and problem solving, related to the system itself • Establish and follow protocols governing which team member responds to what type of questions, how long at most it should take before providing an answer, etc. • Maintain a log of questions and answers to ensure that the same solution is given when a problem arises more than once • System maintenance • Change requests are initiated when system upgrades and redesigns are needed to stay compatible with evolving operating systems and programming languages Statistics Canada • Statistique Canada
Conclusions • Strong partnership between Methodology and Informatics is the key to success in developing generalized systems to perform complex statistical functions • Long process to develop the methodology, write specifications for generalized functions, do the programming and do thorough testing • Benefits are long term more efficient use of resources, more robust tools that can be maintained and added to over time Statistics Canada • Statistique Canada
Xiexie Laurie Reedman Chief of Generalized Systems and Quality Assurance Business Survey Methods Division (613) 951-7301 Laurie.Reedman@statcan.gc.ca 2014-08-26 Statistics Canada • Statistique Canada 47