360 likes | 465 Views
DCV: A Causality Detection Approach for Large-scale Dynamic Collaboration Environments. Ning Gu, Qi-Wei Zhang, Jiang-Ming Yang and Wei Ye Fudan University. Jiang-Ming Yang Microsoft Research Asia. Agenda. Introduction Direct Causal Vector Timestamp (DCV) Causality Detection Algorithms
E N D
DCV: A Causality Detection Approach for Large-scale Dynamic Collaboration Environments Ning Gu, Qi-Wei Zhang, Jiang-Ming Yang and Wei Ye Fudan University Jiang-Ming Yang Microsoft Research Asia
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Discussion • Future Work
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Discussion • Future Work
Group Editors • Enable a group of users to view and edit a same document simultaneously from geographically dispersed sites connected by communication networks
Wiki • A wiki is such a website — every visitor is allowed and able to add new pages to it, remove existing pages, or otherwise edit and change the content of existing pages • Wiki is becoming more and more popular, as a novel and convenient collaboration medium • Samples: Wikipedia, WikiNews, WikiTravel, etc • Adapting existing single-user wiki page editors to full-replicated group editors.
Realtime group editing in Wiki: The Problems • The collaborative environments in wikis are typically large-scale and dynamic collaboration environments. • Large number of participants • Highly dynamic • Unreliable
Existing solutions : Vector timestamp • Traditional vector logical clock timestamp • Each item corresponds to a collaboration participant, and records the number of operations generated by that participant that are causally preceding O. • causal relationship between any two operations can be easily determined • Size of vector logical clock linearly depends on the number of participants
Existing solutions : Dynamic timestamp • Using associative vectors indexed by participant identifier, thus allowing the system to dynamically add new timestamp items or discard old timestamp items during the collaboration session. • Creating vector items just for those participants who have written. • if some participants midway leave the collaboration session, watching their corresponding timestamp items and removing them once they have become insignificant.
Existing solutions : Vector compression • Related Works • Sun-Cai approach, NICE approach • Single-point failure. Have an adverse impact on conflict handle. • SOCK4, TIBOT • Cooperating sites must be well-connected. The communication channels among them must be stable and reliable. • Also have adverse impacts on conflict handle.
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Discussion • Future Work
Causal Vector • Causal Relation (->) • Given two operations Oa and Ob, Oa->Ob iff: • Oa and Ob are generated at a same site, and the generation of Oa happened before Ob • Oa and Ob are generated at different sites i and j, and the execution of Oa at site j happened before the generation of Ob • There exists an operation Ox, such that Oa->Ox and Ox->Ob
Causality Preservation • o11, o21 and o31 are concurrent with each other. • o12 is causally dependent on three operations o11, o21 and o31. • o22 is causally dependent on four operations o11, o21, o31 and o12. • o32 is only causally dependent on o31
Direct Causal Relation • Direct Causal Relation () • Given two operations Oa and Ob, Oa Ob iff: • Oa -> Ob • And there exists no operation Ox satisfying Oa->Ox and Ox->Ob
Causality Preservation • o12 is directcausally dependent on three operations o11, o21 and o31. • o22 is directcausally dependent on o12. • o32 is directcausally dependent on o31
Direct Causal Vector(DCV) • Direct Causal Vector (DCV) • Given an operation O,{O1, O2, …, Ok}O,O1(s1, n1), …, Ok(sk, nk), DCV(O)=[(s1, n1), (s2, n2), …, (sk, nk)] • The direct causal vector of o11, o21, o31, o12, o22 and o32 is [ ], [ ], [ ], [(1, 1), (2, 1), (3, 1)], [(1, 2)] and [(1, 3)] respectively. • Given two operations, their direct causal relationship can be determined directly from their direct causal vectors • Give two operations Oa and Ob, Oa Ob iff there exists an item for Oa in DCV(Ob)
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Discussion • Future Work
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Causality Preservation • Concurrent Separation • Causality Cache • Discussion • Future Work
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Causality Preservation • Concurrent Separation • Causality Cache • Discussion • Future Work
Causality Preservation • Problem Description: • Given an unexecuted remote operation, how to determine whether all the operations causally preceding it have already been executed? • Solution: • Given a remote operation O, suppose all the operations executed before respect their causal order. O is causally ready iff all the operations causally precede o directly have already been executed.
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Causality Preservation • Concurrent Separation • Causality Cache • Discussion • Future Work
Concurrent Separation • Problem Description: • Given an unexecuted remote operation Or that is causally ready, how to separate all the history operations that are concurrent to it from the operation history? • Solution: • Suppose there are two operation Oa and Ob in history, Oa->Ob, it will not check Oa until Ob is proved to be concurrent with Or • do not check a history operation until all the history operations that are causally dependent on it have been proved to be concurrent with Or • When checking a operation Oh, Oh is concurrent to Oriff not OhOr • There is no operation Ox, which satisfy Oh->Ox->Or
Concurrent Separation • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] . • O32 and O22 is checked first, because there is no other history operation causally depends on them. O22 is proved to be concurrent with O41.
Concurrent Separation • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] . • O32 and O22 is checked first, because there is no other history operation causally depends on them. O22 is proved to be concurrent with O41. • O12 is checked in the second round, it is proved to be concurrent with O41.
Concurrent Separation • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] . • O32 and O22 is checked first, because there is no other history operation causally depends on them. O22 is proved to be concurrent with O41. • O12 is checked in the second round, it is proved to be concurrent with O41. • Now, O11 and O21 is ready to be checked, and O21 can be determined concurrent with O41
Concurrent Separation • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] .
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Causality Preservation • Concurrent Separation • Causality Cache • Discussion • Future Work
Causality Cache • Problem Description: • How to determine the causal relationship between two arbitrary operations in the operation history? • Solution: • Every time before executing a remote operation Or, all the earlier executed operations that are concurrent with it are separated out in set SCr in advance. Cache this result for future use. • If there are more than one operations in SCr that are generated at a same cooperating site, just keep the earliest generated one.
Causality Cache • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] . • There are three operations concurrent with O41 O12, O21, O22
Causality Cache • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] . • Two Groups based on their site Site-1 : O12 Site-2 : O21, O22
Causality Cache • Suppose a remote operation O41 is received at site 3 shortly after the execution of o22 on that site. DCV(O41)= [(1, 1), (3, 2)] . • Catch={O12, O21} Site-1 : O12 Site-2 : O21, O22 • O21∊ Catch => O21||O41 => O22||O41
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Discussion • Future Work
Disscussion • After a person leave the collaboration session, the size of direct causal vector timestamp of later generated operations will automatically shrink.
Disscussion • Feature of DCV approach • The size of direct causal vector timestamp of an operation O approximates the number of participants active at editing the shared object recently before the generation of operation O. • It is not pre-allocated for each potential participants an item in the direct causal vector timestamp. • After a person leave the collaboration session, the size of direct causal vector timestamp of later generated operations will automatically shrink. • The time complexity, storage complexity of our algorithms also just linearly depends on the number of collaboration participants that are currently active at editing. • DCV approach is much more scalable. It is not rely on a stable network. And it has no constraint on users’ collaboration mode.
Agenda • Introduction • Direct Causal Vector Timestamp (DCV) • Causality Detection Algorithms • Discussion • Future Work
Future Work • Compression of Direct Causal Vector in highly active large-scale collaboration environments • Experiments of the effects • ……