290 likes | 464 Views
The Continuing Evolution of Generalized Systems at Statistics Canada for Business Survey Processing. Chris Mohl Statistics Canada. Outline . Why Generalize? Factors Influencing the Evolution The Systems Development, Support and Maintenance Lessons Learned Possible Future Activities
E N D
The Continuing Evolution of Generalized Systems at Statistics Canada for Business Survey Processing Chris Mohl Statistics Canada
Outline • Why Generalize? • Factors Influencing the Evolution • The Systems • Development, Support and Maintenance • Lessons Learned • Possible Future Activities • Conclusions
Why Generalize Systems? • Fully researched methods • Thoroughly tested • Complete documentation • Expert support team • Minimal user programming required – improves timeliness • Coherent methods across surveys
Factors Influencing the Evolution • Changes in technology • Mainframe to PC/UNIX processing • Some underlying software no longer supported • Statistics Canada’s SAS site license • Need for new or more sophisticated methods
The Systems • Can be classified into three groupings • Mature Systems • No new development • Redesign Systems • Reengineering of old systems • New Development Systems • New methodologies
Mature Systems • The longest surviving generalized systems • No new functionality being added – only maintenance • SAS macros • Interface built with SAS/AF • Can be run in batch mode (macro call within SAS program) or via interface • PC or UNIX
Mature Systems • Generalized Sampling (GSAM) • Performs functions related to sample selection for ongoing and ad hoc surveys • Stratification, Allocation, Sampling, Frame Maintenance • Generalized Estimation System (GES) • Performs functions related to weighting and estimation • One-stage element and cluster, two-phase element designs • Mostly design based, some synthetic, jackknife
Redesigned Systems • Generalized systems previously existed that performed similar functions but needed replacement • Why? • Often due to outdated architecture – mainframe, obsolete software • New capabilities in SAS • New methodologies couldn’t be integrated into previous system
Redesigned Systems • Banff (replaces Oracle based GEIS) • Performs edit and imputation of numeric continuous data • Nine custom built SAS procedures • SAS Enterprise Guide based “interface” (Banff wizards)
Redesigned Systems • New CONFID • Performs protection of tabular economic data • SAS-based custom built procedures (like Banff) and macros for PC and UNIX • Jasper (replacement for ACTR) • Performs automated coding of character strings • Retains interface-based processing, but may later build SAS-based custom built procedures
New Development Systems • Fills in needs for functionality not already available in other generalized systems • Replaces customized programs that may already exists
New Development Systems • Statistical Macro Extensions (StatMx) • New functionality not available in GES / GSAM • Multi-stage design estimation, Lavallée-Hidiroglou allocation, extended synthetic estimation • SAS macros, no interface • Forillon • Time Series processing • Benchmarking sub-annual series, Raking to retain additivity, trend computations, variance calculations, analytical tools • SAS-based procedures and Enterprise Guide "interface”
Development, Support and Maintenance • Most systems developed and maintained by teams of individuals from two groups • Mathematical statisticians (Methodology Branch) • Programmers (Informatics Branch) • Certain projects are the sole responsibility of one group • Moving away from such situations
Development • Methodologists review mathematical needs • Consultation with potential users, literature searches, research into mathematical methods • Programmers review informatics needs • Methodologists write specifications • Programmers produce new version • Methodologists do final certification • Documentation is written
Support • Team members not directly responsible to implement the systems – assist users • Mathematical questions go to methodologists, informatics questions to programmers • Amount of support depends upon number of users, complexity of the methods, “newness” of the system
Maintenance • May consist of bug fixes or adding new functionality • May be identified by the users or by team members • Team members work together to identify if it merits attention and then implement and certify the change
Costs • Generalized systems require a very significant outlay of resources • Varies significantly from project to project • Development of a large project • 2-3 methodologists, 2-3 programmers over several years • Support and maintenance • 1 methodologist, 1 programmer per year
Lessons Learned • Reduce Software Diversity • Emphasis put on SAS, reduce reliance on different programming languages • Easier to move people from one project to another • Users only need to know one language • Learning SAS is part of staff’s early training
Lessons Learned • Traditional interfaces are expensive – there are alternatives • Interface development can cost as much as the mathematical functionality • Changes can be difficult • Often does not upgrade as well as rest of the system • Most users prefer batch processing for production • Can be necessary when tool is used by non-technical personnel • SAS Enterprise Guide being successfully used
Lessons Learned • People like things they are familiar with • Customized SAS procedures (Banff, Forillon) have been favorably received • Centralization of resources is beneficial • People can take ideas used in one project and apply it to others • Examples: Enterprise Guide interfaces, Customized SAS procedures
Lessons Learned • Modularity and flexibility are important • Some early systems too rigid – successful ones had more flexibility • Users only want pieces of certain systems • Reduce custom-built systems, put in generalized systems • People often “borrow” other programs and don’t understand all the implications • Support is a problem when person leaves project • However, timing sometimes makes it necessary
Lessons Learned • Buy when possible, but don’t get cornered • No need to build certain components ex. linear programming function • Ensure that changing to an alternate component is not difficult • Make sure that the support is there • Stay up to date on technology • Don’t wait too long to react to advances • Ex. Mainframe → PC 1990s, Linux
Possible Future Activities • Current Systems • Banff – categorical data capabilities • New CONFID – add additional functionality • Jasper – review of methodology used • Forillon – add additional functionality • StatMx – advanced variance calculations?
Possible Future Activities • General avenues • Continue movement towards SAS based procedures and Enterprise Guide interfaces • Buy components when possible – free up programming resources for specialized tasks • Metadata table-based processor
Conclusions • Generalized Systems have become a critical part of business survey processing • Due to the investments made in development we have to keep them relevant • Moving towards a more standardized look and feel • Use what we have learned in the past to help shape the future
For more Information please contact Pour plus d’information, veuillez contacter Chris MohlChris.Mohl@statcan.ca