Monitoring of Aggregation Levels in Distributed Component Based Data Production Systems

Monitoring of Aggregation Levels in Distributed Component Based Data Production Systems BTW 2003, Leipzig, 27.02.2003 • Anja Schanzenberger • GfK Marketing Services, NürnbergUniversity of Middlesex, London • Colin Tully, Dave Lawrence • University of Middlesex, London

2 • 1 • Application • Application Area • 3 • Monitoring of Aggregation Levels • The General Business of GfK Marketing Services • The Basic Idea of Data Production System Agenda • The Planning, Controlling and Monitoring System • Single Record Tracking • The Tubing System • Reconstructing Aggregation Levels

1 Application Area

The GfK Group: Key Features • Anticipated EUR 568 million in 2002; previous year: EUR 506 million • Increase on the previous year: +12% Total revenue • More than 4,800 full-time staff • 70% of which abroad Employees • Integrated systems using standardised instruments throughout Europe and beyond Services • Over 130 subsidiaries, branches and participations in 50 countries on five continents Network

Consumer Tracking Consumer and retail panel based Business Information Solutions for manufacturers and retailers for consumer packaged goods and service companies Non-Food Tracking Retail panelbased marketing information for manufacturers and retailers in consumer technology industries Interview and test market based support information for new product development and brand management across a wide range of industries Ad Hoc Research In interview and panel based audience and readership measurement and consumer response testing for TV, print, radio and Internet Media Four Complementary Business Divisions

GfK Business Divisions Consumer Tracking16.4% Non-Food Tracking23.6% Other 6.4% Share of total performance 12.1% Media 41.5% Ad Hoc Research

Non-Food Tracking Ad Hoc Research Consumer Tracking Media Non-Food Tracking: Key Services periodical monitoring Information services in 44 countries on marketing, sales, logistics in retail and industry for companies operating in consumer technology markets. Key services Direct access to databases and/or transmission of standardized analyses to support, monitor and manage short, medium and long term decisions on product and pricing policy, advertising, distribution, sales and logistics. The advantage for clients Market leader in the regions Europe and Asia and Pacific as well as in the Arab countries; together with partner NPD Intelect, market leader in North America. Positioning Non-Food Tracking Retail panel Information services on consumer durables, in particular for the consumer electronics, photographic, information technology, telecommunications, software, domestic appliances and equipment markets

2 Application

Clients Retailers StarTrack Working Areas Data - IN Data - Preparation MDM IDAS Data Warehouse(Extrapolation, Reports) DWH Creating value through knowledge

Data Production System Local client Local server Central server Identification(WebTAS) General InterfaceManager (GIM) Separation Central IDASoutput pool DWH Projectionsystem Data receipt Local Output Mainframe Planning – Controlling – Monitoring System

PCMS Dimensions current state • predefined process steps • manual state checking • manual error tracking PLANNING Data Production System • dynamic production process configuration • production planning and monitoring • proactive error handling MONITORING CONTROLING envisioned state

3 Monitoring of Aggregation Levels

Definitions aggregate functions aggregation levels SUM, AVG,... instruction input many data sets output one data set input one data set output many data sets separation (disaggregation) aggregation GROUP BY multiple groupings

Aspects to the Monitoring of Aggregation Levels • Summaries • after significant process steps • summaries of operating figures • Single Record Tracking • tracking of single retailer items up to the customer report • simulation of planned production cycles(ETL-Tools)

Example - Single Record Tracking R: retailer CW: calendar weekDP: delivery periodRP: reporting period component X pool A pool B Item A R: Vobis – DP: CW 04/2002- sales volume: 6 Item A R: Vobis – RP: Jan 2002- sales volume: 10 Item B R: Vobis – DP: CW 04/2002-sales volume: 9 Item B R: Vobis – RP: CW 04/2002-sales volume: 9 Item A R: Vobis – DP: CW 05/2002-sales volume: 4

Strategies of Tracing Aggregation Levels • Tubing System • the complete workflow cycle • error situations

Characteristics of Monitoring

Possibilities to reconstruct Aggregation Levels (1) • Static Volumes of Data component X instruction:SELECT...WHEREDP1=CW 6, DP2=CW 7DP3=CW 8, DP4=CW 9GROUP BY Vobis, item pool A pool B DP: delivery period CW: calendar week job parameters:itemretailerreporting period item,retailer: Vobisreporting period: Feb/2002 -all items -all retailers -all delivery periods

(1) Static Volumes of Data • Advantages • no additional storage required • historically stored data allows stepwise tracking possibilities • Disadvantages • historically stored requires increased storage facilities • this approach is only significant for a small (historical stored) quantity of data • all job parameters are required • increasing the quantity of data in storage slows down the control system as well as the controlled system • requires additional administration effort

Possibility (2) • Single Record Logging component X log: timestampjob parameters records of A: -item -retailer -delivery period -facts -price pool A pool B job parameters:job_iditemretailer reporting period item,retailer: Vobisreporting period: Feb/2002 -all items -all retailers -all delivery periods

(2) Single Record Logging • Advantages • no policies needed • no static volumes of data • Disadvantages • additional job parameters are needed • at least twice the storage requirement • additional administration effort • slowdown of systems

Possibility (3) • Primary Key Logging • most important attributes • job parameters needed • logging: item, retailer, delivery period • reduction at GfK: ~1/5 • Advantages • storage requirement (approach 3) <storage requirement (approach 2) • no policies needed • no static data volumes • Disadvantages • no deleting of records, but new attribute values for the same records • additional administration effort • slowdown of systems

Possibility (4) • Data Evaluation job parameters:item retailerreporting period component X pool B1 pool A1 item,retailer: Vobisreporting-period: Feb/2002 all instruction processing time pool A2 tracking time pool B2 item,retailer: Vobisreporting-period: Feb/2002 instruction all

(4) Data Evaluation • Advantages • no additional logging • no additional storage required • alterations of records allowed • no static data volumes • Disadvantages • policies are needed program extension • job parameters are needed • only an imprecise estimate (processing time <> tracking time) • double execution time of component

Conclusion (I) • Static Volumes of Data • environments: (historical) static data volumes • least logging effort • best approach, but often not applicable • Single Record Logging • environments: min. 2*storage required and slowdown acceptable • suitable when gathered amount of data >> processed amount of data (e.g. ad-hoc reports) • Primary Key Logging • environments: less manipulations acceptable • deleting of records is not allowed • additional logging effort

Conclusion (II) • Data Evaluation • environments: level of impreciseness acceptable • no additional logging effort • no additional implementation work • no additional storage required • system load increases -> recommended for slack times Support from Database Tools ? … more information anja.schanzenberger@gfk.de

Monitoring of Aggregation Levels in Distributed Component Based Data Production Systems

Monitoring of Aggregation Levels in Distributed Component Based Data Production Systems

Presentation Transcript

Distributed Component Based Systems

Automatic Configuration of Component-Based Distributed Systems

Real-time Data Access Monitoring in Distributed, Multi Petabyte Systems

MSc Thesis MONITORING OF COMPONENT-BASED APPLICATIONS

Fault-tolerance in Component-based Systems

Interoperability of Distributed Component Systems

Policy-Based Distributed Data Management Systems

Component based distributed systems

Distributed event aggregation for content-based Publish/Subscribe systems

Interoperability of Distributed Component Systems

Monitoring Distributed Data Streams

Optimisation of behaviour of component-based distributed systems

Component Interaction in Distributed Systems

Automatic Configuration of Component-Based Distributed Systems

Efficient Decentralized Monitoring of Safety in Distributed Systems

Explicit Connectors in Component Based Software Engineering for Distributed Embedded Systems

Efficient Decentralized Monitoring of Safety in Distributed Systems

Component Based Systems Analysis

MSc Thesis MONITORING OF COMPONENT-BASED APPLICATIONS

Component Interaction in Distributed Systems

Monitoring Distributed Data Streams

Component based distributed systems