180 likes | 290 Views
Can a Divorced MOM and DAD take care of the CHILD ?. MOM – Message Oriented Middleware. DAD – Direct Access to Data (DBMSs). CHILD – Correlating Historical or In-transit Large-scale Data-stream. In this talk …. Introduce CHILD - Correlating Historical In-transit Large-scale Data-streams.
E N D
Can a Divorced MOM and DAD take care of the CHILD ? MOM – Message Oriented Middleware. DAD – Direct Access to Data (DBMSs). CHILD – Correlating Historical or In-transit Large-scale Data-stream.
In this talk … • Introduce CHILD - Correlating Historical In-transit Large-scale Data-streams. • Compare CHILD and current Stream Processing Engines. • How DAD and MOM can/may help/work together? • Summary.
The Supply Chain Example Some funny DAD characteristics: • DADs are corporates custodians of truth. • DADs generally maintain a single version of truth - the recent truth. • DADs are optimized to answer questions for a single version of truth. • The truths can be atomically evaluated to answer the questions. • There is only one answer to the question. • DADs do not remember the answers provided to the previously asked questions.
Supply Chain Evolves to Accommodate Emerging Business Practices • Some of the Characteristics of MOM • Allows asynchronous communication between disconnected systems within and across organizations. • Provides Message Filtering and Message Correlation. • Persistence and Guaranteed Delivery Mechanism. • Message enrichment can be achieved by referencing static datasets during routing.
Proactive Supply Chain Management • In a proactive case: • Each system creates it unique view of state of interest and receives information about changes to state of interest. • There may not be a complete truth. Facts may arrive over a period of time. • The answers to the questions change as new facts become available. • The aim is to reduce the time to re-compute most recent answers.
Scenario: London Congestion Charging ( + security )Command & Control Real time processing Sensor Reading DB Billing Retrospective processing Charging ( and security) rules • vehicle license plate , owner, owner residency, fee paid ? • entry and exit times of vehicle, time of day, day of week , charging, residency • reentry within 3 hours is free • fraud: enters zone and not seen ; security - grouped tanker trucks • 100,000’s vehicle observations / hour Security/ fraud alerts
Example of CHILD Applications • RFID • Sensor Networks • Stock Quotes • Database Notification • Content Routing Networks • RSS Aggregators
CHILD – Correlating Historical or In-transit Large-Scale Data Streams Characteristics: • Append Only Data. • Push Paradigm – Stream of Data (truths), static set of queries (questions). • Continuous processing requirements. • Correlation requirements.
∆’ ∆’’ ∆’’’ ∆* S’ S’’ S’’’ S* CHILD – Correlating Historical or In-transit Large-Scale Data Streams - 2 All queries have associated time constraint specified in terms of windowing functions. Query Type 1: Query when states S’, S’’. S’’’, S* are reached. (DB Notification) Query Type 2: Query when S’’’ is reached after S’ and S’’ (Sensor Networks) Query Type 3: Query when S* is reached within 2 transitions from S’. (BI) Query Type 4: Get an aggregate of (∆) (Sensor Network) Query Type 5: Query when S’, S’’ were observed in the past N time windows. (Fraud detection Networks) Query Type 6: Query when ∆’, ∆’’, ∆’’’ resulted in exact changes from S’ to S’’ to S’’’. (ESB) Query Type 7: Query when S’,S’’,S’’’ …∆’, ∆’’, ∆’’’… were not observed. (Fraud Detection)
∆* ∆’ δ’ ∆’’’ δ * δ’’’ ∆’’ δ’’ P* P’’ P’’’ S* S’ P’ S’’ S’’’ CHILD – Correlating Historical or In-transit Large-Scale Data Streams - 3 All queries have associated time constraint specified in terms of windowing functions. Query Type 8: Query Evaluate Join S,P states (All Most all use cases) Query Type 9: Query Co evaluate Filter on S,P….. (All Most all use cases) Query Type 10: Query Evaluate Join/Filter on S (t), S (t-T) (Sensor Networks, BI) Query Type 11: Query Evaluate P between states S’ and S’’’ (Sensor Networks, Stock Ticks)
Stream Systems – Academic Projects • AURORA • BOREALIS • STREAMDB • TELEGRAPHCQ • NIGARACQ
CHILD and Stream Processing – Some Observations. • Temporal dimension is not always the predominant one. • For business processing all facts are retained. • An event is in the eye of the beholder, so every tuple is a message until observed in a context. Queries need to have context. • Being “Turing complete” SQL will allow one to specify arbitrary data manipulations, the tradeoff is how much State we retain vs. resource usage vs. throughput. • Declarative stream manipulation language needs to be developed. • A conceptual data model for manipulating append only data should be the focus - not limited to the engineering aspect of the systems. • Additionally, smart summarization techniques are required for correlating and mining historic data.
CHILD and Stream Processing – Observations contd. • Real-time performance is critical ONLY in some cases. • Providing a common abstraction for sequence analysis on the data items appearing in the stream and across the streams remains critical. • Typical stream systems are restricted to 20-to-30 operators and require resource augmentation to handle higher workloads, which in turn requires capabilities similar to MOMs. • For handling queries over historic data and correlation with historic data CHILD requires capabilities equivalent to DADs.
SPE STREAM optional optional DAD SPE-2 SPE-3 SPE-1 SPE-5 SPE-4 Is this not MOM with Content Routing Operators ?
SPE-2 SPE-3 SPE-1 SPE-5 SPE-4 • What is missing? • Ability to create the Ad hoc network of content routers given a list of streams and queries. • Ability to describe and support smart subscriptions • Ability to scale simultaneous evaluation of multiple expressions.
Summary • Stream Processing is just one aspect of the emerging • paradigm of processing append only data with support for • continuous queries. • These systems need a new representational model. SQL • Or SQL extensions are not sufficient. • If not careful we may redevelop parts of MOM and DAD in • The process for creating support for CHILD.
DAD: Well I can provide you triggers if you want? • CHILD: Ahhh !!! As if they scale. • MOM: Well I can talk with other MOMs and enrich the contents on the fly. • CHILD: Oh Is it !! Can you also enrich it on the fly? Or tell me when three red marbles are followed by four green ones? • MOM: Only if I know what marbles are. May be with my content routing hat on I can do that. • CHILD: Yeah Right !!! • CHILD: Can uncle Active (Database) help. • MOM: Oh no, he suffers from Rule Termination Problem. • DAD: Well if you ask Temporally aware brother of mine he can help you relate things in past. • CHILD: But DAD temporal is just one axes, I consider value Axes. I want to purchase a stock of MOBIL OIL only when the fuel price has risen after a REFINERY BLOWUP. Its not time but the context that matters. • MOM: You know my sister STREAM PROCESSING ENGINE can help. • CHILD: Oh Sure, with an ability to provide 20-30 operators, In-Memory operations only. Optional Recovery, Undefined Semantics and NON DECLARATIVE interface, I will be in great hands!!! YUCK!! • MOM: Oh we need to provide him with a mix or else he will replicate our behaviors. • DAD: DOH !!! • DAD: There is one and only one truth that I know. For previous versions of truth see my log… • MOM: I do not need to know the truth, I just GOSSIP. I GOSSIP about facts !!! • CHILD: But MOM, DAD, I do not need to know the complete truth. I want to take decisions now, I will correct them when I know more.