1 / 14

Aggregate Data Dissemination and Functionality

Learn about the objectives, systems, and tools for processing and disseminating aggregate data. Explore statistical databases, commercial developments, and manipulation and presentation functionality.

mcnealb
Download Presentation

Aggregate Data Dissemination and Functionality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dissemination and use of aggregate data: structures and functionality Andrew Westlake Survey & Statistical Computing ssc@count.com www.sasc.co.uk Meta-data & Functionality

  2. Aggregate data: structures and functionality • What are the objectives • Systems to support the preparation, processing and dissemination of statistics in the form of aggregated data • Appropriate tool set • Automation of production processes • Dynamic access and ‘analysis’ • Developments on the Database side • Statistical Database proposals from Computer Science • Commercial development of Data Warehouses (OLAP) • Requirements • Structure • Functionality - Manipulation, Dissemination Meta-data & Functionality

  3. Processing Aggregate Data Meta-data & Functionality

  4. Aggregated Results, as Multi-way Table • Period • Year • Week Month • Day • Measures • Reports received • Population at risk • Estimated Incidence rate • SD of Incidence rate { • Location • Country • Region • District • Detail • Minor Group • Major Group • Disease Classification (ICD) This example has three dimensions (so that it can be visualised). In reality, for this application, we would need at least two more, Age and Gender. Meta-data & Functionality

  5. Statistical Databases • SSDBM conferences, from early ‘80s • STORM model, Rafanelli & Shoshani, ‘90 • Summarizability, Lenz & Shoshani, ‘97 • National Statistical Offices • Research Projects, particularly Eurostat • Idaresa, Addsia, Rainbow, IMIM • Concern for concepts, structure, rules, validity • No Money Meta-data & Functionality

  6. Commercial developments • Data Warehouse • DB with Emphasis on performance with fixed data, no transactional requirements • Star schema for multi-way tables, Data Cubes • Products from main stream DB vendors, and specialists • OLAP (On-Line Analytical Programming) • Term invented by Codd • Emphasis on exploration of aggregate structure, selection of sub-groups, change focus between detail and broad groups • Lots of Money • Products • DB Vendors, e.g. Oracle Express, Pivot tables in MS Excel 2000, Informix Red Brick • Specialists, e.g. Beyond 20/20, Super-Star • Standardisation proposals Meta-data & Functionality

  7. Aggregation Functionality • Store information with minimal aggregation • Maximum detail in classifications • Further aggregation (to less detail) on demand (may pre-compute for efficiency) • Algebra for aggregating classifications and measures is basically straight forward • Aggregation of Measures • Everything based on summation can be regrouped(cf. updating algorithms, sufficient statistics) • Some others, e.g Range • Special issues for time, aggregate or cross sectional measures • All aggregated tables are proper tables Meta-data & Functionality

  8. Manipulation Functionality - for Processing • Manipulation of Measures • Introduce measures from other tables with similar structure • Derive measures within cells • Not all combinations are meaningful • Combination of two tables • Find common dimensions and classifications (may require some aggregation or mapping) • Choose one table as the detail table • Aggregate all non-common dimensions out of the 2nd table • Transfer measures from 2nd table, repeating values over missing classifications • Meta-data to control validity of operations Meta-data & Functionality

  9. Rules for proper table structure • Table • Well-defined base population from which measures are computed • May include a selection rule w.r.t. a wider population • Classification • Categories must be exclusive and exhaustive w.r.t. the base population • Cannot have its own selection rule (but might have a residual category) • Measure • May have a selection rule (e.g. count with a property) • Care is sometimes needed to distinguish between classifications and measures Meta-data & Functionality

  10. Confusion between classification and measure Wrong Subject classification is not exclusive if students can register for more than one course Correct Counts selected by subject are different measures Meta-data & Functionality

  11. Presentation Functionality • Layout • Mapping from dimensions to Rows, Columns, Pages • Improper table combinations • Combination of dissimilar dimensionse.g. Age groups by (SEG + Housing) • Distinction between Classification and Measure is less important for presentation • Medium • Paper, Web, often with analysis (commentary) • Machine readable (take away, not linked) • Dynamic, for local or remote manipulaton • Associated material • Generation of descriptions, footnotes, indexes, content lists Meta-data & Functionality

  12. Manipulation Functionality - for Exploration • Dynamic viewing, linked to source aggregations • Selection • Subset of classification cells, and of measures • Dynamic regrouping • Roll up to combine existing groups to next level • Drill down to get more detail in groups at lower level • Operate independently, i.e. not all parts of a classification at the same level • User-defined groupings • All derivation and presentation facilities • Specialist browsers, available for local data or over the Internet Meta-data & Functionality

  13. Discovery through Meta-data • Generic descriptions • Population, Classifications, Measures linked to concept definitions for searching • Specific topics • Formal definitions of standard componentsselection rules, standard classifications, measure types • Specific descriptions of substantive contentsource variable definitions, questionnaire structure, etc. • Accessibility • Information must be available to search engines and user Meta-data & Functionality

  14. Conclusions • Good analysis of structural and functionality requirements can produce good products for automated and individual use • Further academic work on structures and functionality needed • Commercial products are useful but lack many obvious features - we should demand more • Commercially driven standards concentrate on basic functionality and overlook statistical and practical validity - we should get more involved Meta-data & Functionality

More Related