440 likes | 559 Views
Results on Data Delivery (WP3). DBGlobe IST-2001-32645. 1 st Review Paphos, January 31, 2003. Proactive initiative on: Global Computing (GC). Future and Emerging Technologies (FET). The roots of innovation. WP3 Outline. Co-ordination/Data Delivery:
E N D
Results on Data Delivery (WP3) DBGlobe IST-2001-32645 1st Review Paphos, January 31, 2003 Proactive initiative on: Global Computing (GC) Future and Emerging Technologies (FET) The roots of innovation
WP3 Outline • Co-ordination/Data Delivery: • Task 3.1 Data delivery among the system components. Derive adaptive data delivery mechanisms considering various modes of delivery such as • push (transmission of data without an explicit request) and pull, • periodic and aperiodic, • multicast and unicast delivery. • Task 3.2 Model the co-ordination of the mobile entities using workflow management (and transactional workflows) and techniques used in the multi-agent community. DBGlobe, 1st Annual Review Paphos, Jan 2003
Timeline … Year 1 Year 2 WP3 3.1: Data Delivery 3.2 Coordination 3.3 Performance 15 18 21 24 3 6 9 12 Deliverables D8: Data Delivery Mechanisms (Oct 2002) D9: Modeling Coordination Through Workflows (April 2003) D10: Data Delivery and Querying (August 2003) DBGlobe, 1st Annual Review Paphos, Jan 2003
Outcomes of WP3 so far: D8: Data Delivery Mechanisms A taxonomy of mechanisms An outline of potential use within the DBGlobe architecture • A number of specific results in data delivery: • Coherent Push-based Data Delivery • Adaptive Multi-version Broadcast Data Delivery • Efficient Publish-Subscribe Data Delivery DBGlobe, 1st Annual Review Paphos, Jan 2003
In this presentation: Just a note on the different modes Summary of technical results 1. Coherent Data Delivery 2. Adaptive Multi-version Broadcast Data Delivery 3. Efficient Publish-Subscribe Data Delivery DBGlobe, 1st Annual Review Paphos, Jan 2003
D8: Taxonomy of Different Modes of Data Delivery Data Delivery Modes • Client Pull vs. Server Push • pull-based: transfer of information is initiated by the client • push-based: server-initiated, servers send information to clients without any specific request. • push is scalable but clients may receive irrelevant data • hybrid schema: hot data are pushed and cold data are pushed • Aperiodic vs. Periodic • aperiodic delivery: usually event-driven a data request (for pull) or transmission (for push) is triggered by an event (i.e. a user action for pull or a data update for push). • periodic delivery: performed according to some pre-arranged schedule DBGlobe, 1st Annual Review Paphos, Jan 2003
D8: Taxonomy of Different Modes of Data Delivery • Unicast vs 1-N • Unicast: from a data source (server) to the client • 1-to-N: data sent received by multiple clients • multicast and broadcast • Data vs. Query Shipping • Based on the unit of interaction between clients and data sources • Depends on whether the data sources have data processing capabilities • Query shipping may result in reducing the communication load, since only relevant data sets are delivered to the client. DBGlobe, 1st Annual Review Paphos, Jan 2003
push Publish/subscribe (aperiodic, 1-N, push) Email list ( aperiodic, unicast, push) pull Polling ( periodic, unicast, pull) 1-N broadcast unicast periodic aperiodic Request/Response (aperiodic, unicast, pull) D8: Taxonomy of Different Modes of Data Delivery DBGlobe, 1st Annual Review Paphos, Jan 2003
Outline: A note on the different modes Summary of technical results 1. Coherent Push-based Data Delivery 2. Adaptive Multi-version Broadcast Data Delivery 3. Efficient Publish-Subscribe Data Delivery DBGlobe, 1st Annual Review Paphos, Jan 2003
Coherent Data Delivery The Data Broadcast Push Model • The server broadcasts data from a database to a large number of clients • push mode + no direct communication with the server (stateless server, e.g., sensors) • “client-side” protocols • Data updates at the server • Periodic updates for the values on the channel Broadcast Channel Server Client • Efficient way to disseminate information to large client populations with similar interests • Physical support in wireless networks (satellite, cellular) • Various other applications, sensor networks, data streams
Coherent Data Delivery Our Goal Ensure that clients receive temporally coherent (e.g., current) and semantically coherent (transaction-wise) data • Provide a model for temporal and semantic coherency • Show what type of coherency we get if there are no additional protocols • Show what type of coherency is achieved by a number of protocols proposed in the literature (and their extensions)
Temporal Coherency: Model Currency properties of the readset (set of items read and their values) based on currency of the currency of the items in the readset (Currency Interval of an Item) where cb is the time instance the value of x read by R was stored in the database and ce is the time instance of the next change of this value in the database. If the value read by R has not been changed subsequently, ce is infinity. CI(x, R): currency interval of x in the readset of R = [cb, ce) • Based on CI(x, R), two types of currency of the readset of a transaction R • Overlapping • Oldest-value
Temporal Coherency: Model • , say [cb, ce)overlapping current,with overlapping currency, Overlap(R) =ce- (if ce is not infinity), current_time (otherwise) (x, u) RS(R) CI(x, R) there is an interval of time that is included in the currency interval of all tems in R's readset In general, oldest value currency of a transaction R, denoted OV (R), = ce-, where ce is the smallest among the endpoints of the CI(x, R), for every x, (x, u) RS(R). If R is overlapping current, Overlap(R) = OV(R)
Temporal Coherency: Model If not overlapping, we want to measure the discrepancy among the database states seen by a transaction: temporal spread (Temporal Spread of a Readset) Let min_ce be the smallest among the endpoints and max_cb the largest among the begin-points of the CI(x, R) for x in the readset of a transaction R. temporal_spread(R) = max_cb - min_ce, if max_cb > min_ce 0 otherwise. For an overlapping current transaction, the temporal spread is zero!
Temporal Coherency: Model Example R1 reads x1, x2, x3, x4 CI(x1, R1) CI(x2, R1) CI(x3, R1) CI(x4, R1) 2 4 6 8 10 12 14 16 18 20 Overlapping current with Overlap(R) = 8 and temporal_spread(R) = 0
Temporal Coherency: Model Example R1 reads x1, x2, x3, x4 CI(x1, R1) CI(x2, R1) Oldest value read (min_ce) CI(x3, R1) max_cb (most current) CI(x4, R1) 2 4 6 8 10 12 14 16 18 20 Not Overlapping, but OV(R) = 8 and temporal_spread(R) = 9 – 8 = 1
Temporal Coherency: Model Example R1 reads x1, x2, x3, x4 CI(x1, R1) CI(x2, R1) Oldest value read (min_ce) CI(x3, R1) max_cb (most current) CI(x4, R1) 2 4 6 8 10 12 14 16 18 20 Not Overlapping, but OV(R) = 8 and temporal_spread(R) = 15 – 8 = 9
Temporal Coherency: Model Besides discrepancy, currency (how old are the values seen) (Transaction-Relative Currency) R is relative overlapping current with respect to time instance t, if t CI(x, R), x read by R. R is relative oldest-value current with respect to time instance t, if t ≤ OV(R). (Temporal Lag) Let tc be the largest t ≤ tcommit_R, with respect to which R is relative (overlapping or oldest value) current, then temporal_lag(R) = tcommit_R - tc. The smaller the temporal lag and the temporal spread, the higher the temporal coherency of a read transaction. best temporal coherency when overlapping relative current with respect to tcommit_R (both the time lag and the temporal spread are zero).
Example Temporal Coherency: Model R1 CI(x1, R1) CI(x2, R1) CI(x3, R1) CI(x4, R1) 2 4 6 8 10 12 14 16 18 20 Overlapping current with Overlap(R) = 8 temporal_spread(R) = 0 temporal_lag(R) = 0
Example Temporal Coherency: Model R1 CI(x1, R1) CI(x2, R1) CI(x3, R1) CI(x4, R1) 2 4 6 8 10 12 14 16 18 20 Overlapping current with Overlap(R) = 8 temporal_spread(R) = 0 temporal_lag(R) = 12 – 8 = 4
Example Temporal Coherency: Model R1 CI(x1, R1) CI(x2, R1) CI(x3, R1) CI(x4, R1) 2 4 6 8 10 12 14 16 18 20 Overlapping current with Overlap(R) = 8 temporal_spread(R) = 0 temporal_lag(R) = 19 – 8 = 11
Temporal Coherency: Protocols • What is the coherency of R (temporal lag and spread) if R just • reads items from the broadcast? • Let tlastread_R be the time instance R performs its last read. • temporal_lag(R) ≤ tcommit_R - begin_cycle(tbegin_R) and temporal_spread(R) ≤ tlastread_R - begin_cycle(tbegin_R) • (tight bounds)There are cases that we get the worst lag and spread • If pu = 0 (immediate updates), best (worst) lag and spread • If all items from the same cycle, spread is 0, and lag = pu
Temporal Coherency: Protocols Basic Techniques • Protocols fall in two broad categories: • invalidation (which corresponds to broadcasting the endpoints (ces) of the currency interval for each item) • versioning (which corresponds to broadcasting the begin points (cbs) of the currency interval for each item) And a hybrid protocol that combines versioning and invalidation
Temporal Coherency: Protocols Invalidation Periodically broadcast, IR, a list with the items that have been updated since the broadcast of the previous IR In the paper: variations that give transactions with different values of temporal spread and lag Versioning • With each item, broadcast a timestamp (version) when it was created • Again in the paper: variations that give transactions with different values of temporal lag (spread is always 0)
Semantic Coherency: Model Definitions of Semantic Coherency (Consistency) C0 C1RS(R) DS (subset of a consistent database state) C2R serializable with the set of server transactions that read values read (directly or indirectly) by R C3R serializable with the all server transactions C4R serializable with the all server transactions and the serializability order of the server transactions that R observes is consistent with the commit order of transactions at the server Rigorous schedules: commit order compatible with the serialization order
Relating Semantic and Temporal Coherency (Currency Interval of an Item) CI(x, R): currency interval of x in the readset of R = [cb, ce) where cb is the commit time of the transaction that wrote the value of x read by R ceis the commit time of the transaction that updated x immediately after or infinity
Semantic Coherency: Protocols Reading from a single cycle If transactionRreads all items from the same cycle, it isC1 but not necessarily C2 If the server schedule is rigorous and Rreads all items from the same cycle, it isC4
Semantic Coherency: Protocols Read Test Theorem It suffices to check for violation of C2, C3, andC4 by a client transaction R whenR reads a data item if and only if the server schedule is rigorous In the paper: various read-tests (based on testing the serailizability graph) for attaining various Ci-consistency degrees and their relationships to proposed approaches in the literature
Coherency in Broadcast-Based Dissemination Future Work • Multiple Servers: What is the semantic and temporal coherency the client gets • Performance Evaluation of the various types of coherency Reference E. Pitoura, P. K. Chrysanthis and K. Ramamritham. “Characterizing the Temporal and Semantic Coherency of Broadcast-based Data Dissemination”. Proc. of the 9th International Conference on Database Theory (ICDT03), January 2003, Siena, Italy.
Outline: A note on the different modes Summary of technical results 1. Coherent Push-based Data Delivery 2. Adaptive Multi-version Broadcast Data Delivery 3. Efficient Publish-Subscribe Data Delivery DBGlobe, 1st Annual Review Paphos, Jan 2003
Multi-version Broadcast Similar Model BUT The server (data source) at each cycle sends not just one value per item but instead multiple versions per item Applications: Multiple data servers share the channel (multi-sensors networks) Enhance consistency at the server (similar to multi-version schemes in traditional client-server systems)
Multi-version Broadcast Issues How should the broadcast be organized? What are appropriate client-cache protocols? • Adaptability • Performance depends on client access patterns • Historical queries • Random queries
Multi-version Broadcast References E. Pitoura and P. K. Chrysanthis. “Multiversion Data Broadcast”, IEEE Transactions on Computers 51(10):1224-1230, October, 2002 O. Shigiltchoff, P. K. Chrysanthis and E. Pitoura. “Multi-version Data Broadcast Organizations”. In Proc. of the 6th East European Conference on Advances in Databases and Information Systems (ADBIS), September 2002, Bratislava, Sloavakia O. Shigiltchoff, P. K. Chrysanthis and E. Pitoura. “Adaptive Multi-version Data Broadcast Organizations”, In preparation for journal publication
Outline: A note on the different modes Summary of technical results 1. Coherent Push-based Data Delivery 2. Adaptive Multi-version Broadcast Data Delivery 3. Efficient Publish-Subscribe Data Delivery DBGlobe, 1st Annual Review Paphos, Jan 2003
Coherent Data Delivery The Model • The server repetitively pushes data from a database to a large number of clients • sequential client access • asymmetry: • large number of clients • transmission capabilities • Client-site protocols • The server is stateless • Data updates at the server Server Client Broadcast Channel
Coherency in Broadcast-Based Dissemination Updates Data are updated at the server What is the value broadcast at time instance t? we assume periodic updates with an update frequency or period of pu: meaning that the value placed at time t is the value of the item at the beginning of the update period denoted begin_cycle(t) For periodic broadcast, usually pu is equal to the broadcast period
Coherent Data Delivery Preliminary Definitions Database state: set of (data item, value) pairs Readset of a transaction R, RS(R): set of (data item, values) that R read BSc: the content of the broadcast at the cycle that starts at time instance c (again a set of (data item, value) pairs R may read items from different broadcast cycles, thus items in RS(R) may correspond to different database states
Semantic Coherency: Model Variations: instead of a single client transaction a setS of client transactions Example C3- siteAll transactions of a client serializable with all server transactions C3 - All
Relating Semantic and Temporal Coherency • Assumptions: • Server schedules are serializable • Broadcast only committed values If R is overlapping current, then it is C1 consistent
Relating Semantic and Temporal Coherency (Currency Interval of an Item) CI(x, R): currency interval of x in the readset of R = [cb, ce) where cb is the commit time of the transaction that wrote the value of x read by R ceis the commit time of the transaction that updated x immediately after or infinity • Note: • overlapping currency similar to vintage transactions Server schedules are serializable: ce-vinatge • semantic currency similar to t-bound, if OV(R) = to, then to-bound
Coherency in Broadcast-Based Dissemination Previous Work • cache consistency • (e.g., [Barbara&Imielinski, SIGMOD95, Acharya et al, VLDB1996]) • Datacycle [Bowen et al, CACM92] – hardware for detecting changes • Extended for multiple servers[Banerjee&Li, JCI94] • Certification reports [Barbara, ICDCS97] • F-Matrixfor (update (C2) consistency) [Shanmugasundaram, SIGMOD99] • SGT Graph (for serializability) [Pitoura, ER-Workshop98], [Pitoura, DEXA-Workshop98], [Pitoura&Chrysanthis, ICDCS99] • Multiple Versions[Pitoura&Chrysanthis, VLDB99] [Pitoura&Chrysanthis, IEEE TOC 2003]