140 likes | 150 Views
Learn about the objectives, systems, and tools for processing and disseminating aggregate data. Explore statistical databases, commercial developments, and manipulation and presentation functionality.
E N D
Dissemination and use of aggregate data: structures and functionality Andrew Westlake Survey & Statistical Computing ssc@count.com www.sasc.co.uk Meta-data & Functionality
Aggregate data: structures and functionality • What are the objectives • Systems to support the preparation, processing and dissemination of statistics in the form of aggregated data • Appropriate tool set • Automation of production processes • Dynamic access and ‘analysis’ • Developments on the Database side • Statistical Database proposals from Computer Science • Commercial development of Data Warehouses (OLAP) • Requirements • Structure • Functionality - Manipulation, Dissemination Meta-data & Functionality
Processing Aggregate Data Meta-data & Functionality
Aggregated Results, as Multi-way Table • Period • Year • Week Month • Day • Measures • Reports received • Population at risk • Estimated Incidence rate • SD of Incidence rate { • Location • Country • Region • District • Detail • Minor Group • Major Group • Disease Classification (ICD) This example has three dimensions (so that it can be visualised). In reality, for this application, we would need at least two more, Age and Gender. Meta-data & Functionality
Statistical Databases • SSDBM conferences, from early ‘80s • STORM model, Rafanelli & Shoshani, ‘90 • Summarizability, Lenz & Shoshani, ‘97 • National Statistical Offices • Research Projects, particularly Eurostat • Idaresa, Addsia, Rainbow, IMIM • Concern for concepts, structure, rules, validity • No Money Meta-data & Functionality
Commercial developments • Data Warehouse • DB with Emphasis on performance with fixed data, no transactional requirements • Star schema for multi-way tables, Data Cubes • Products from main stream DB vendors, and specialists • OLAP (On-Line Analytical Programming) • Term invented by Codd • Emphasis on exploration of aggregate structure, selection of sub-groups, change focus between detail and broad groups • Lots of Money • Products • DB Vendors, e.g. Oracle Express, Pivot tables in MS Excel 2000, Informix Red Brick • Specialists, e.g. Beyond 20/20, Super-Star • Standardisation proposals Meta-data & Functionality
Aggregation Functionality • Store information with minimal aggregation • Maximum detail in classifications • Further aggregation (to less detail) on demand (may pre-compute for efficiency) • Algebra for aggregating classifications and measures is basically straight forward • Aggregation of Measures • Everything based on summation can be regrouped(cf. updating algorithms, sufficient statistics) • Some others, e.g Range • Special issues for time, aggregate or cross sectional measures • All aggregated tables are proper tables Meta-data & Functionality
Manipulation Functionality - for Processing • Manipulation of Measures • Introduce measures from other tables with similar structure • Derive measures within cells • Not all combinations are meaningful • Combination of two tables • Find common dimensions and classifications (may require some aggregation or mapping) • Choose one table as the detail table • Aggregate all non-common dimensions out of the 2nd table • Transfer measures from 2nd table, repeating values over missing classifications • Meta-data to control validity of operations Meta-data & Functionality
Rules for proper table structure • Table • Well-defined base population from which measures are computed • May include a selection rule w.r.t. a wider population • Classification • Categories must be exclusive and exhaustive w.r.t. the base population • Cannot have its own selection rule (but might have a residual category) • Measure • May have a selection rule (e.g. count with a property) • Care is sometimes needed to distinguish between classifications and measures Meta-data & Functionality
Confusion between classification and measure Wrong Subject classification is not exclusive if students can register for more than one course Correct Counts selected by subject are different measures Meta-data & Functionality
Presentation Functionality • Layout • Mapping from dimensions to Rows, Columns, Pages • Improper table combinations • Combination of dissimilar dimensionse.g. Age groups by (SEG + Housing) • Distinction between Classification and Measure is less important for presentation • Medium • Paper, Web, often with analysis (commentary) • Machine readable (take away, not linked) • Dynamic, for local or remote manipulaton • Associated material • Generation of descriptions, footnotes, indexes, content lists Meta-data & Functionality
Manipulation Functionality - for Exploration • Dynamic viewing, linked to source aggregations • Selection • Subset of classification cells, and of measures • Dynamic regrouping • Roll up to combine existing groups to next level • Drill down to get more detail in groups at lower level • Operate independently, i.e. not all parts of a classification at the same level • User-defined groupings • All derivation and presentation facilities • Specialist browsers, available for local data or over the Internet Meta-data & Functionality
Discovery through Meta-data • Generic descriptions • Population, Classifications, Measures linked to concept definitions for searching • Specific topics • Formal definitions of standard componentsselection rules, standard classifications, measure types • Specific descriptions of substantive contentsource variable definitions, questionnaire structure, etc. • Accessibility • Information must be available to search engines and user Meta-data & Functionality
Conclusions • Good analysis of structural and functionality requirements can produce good products for automated and individual use • Further academic work on structures and functionality needed • Commercial products are useful but lack many obvious features - we should demand more • Commercially driven standards concentrate on basic functionality and overlook statistical and practical validity - we should get more involved Meta-data & Functionality