80 likes | 90 Views
Explore provenance maintenance, querying, and anomalies in distributed systems for better understanding and debugging. Project includes a data-centric perspective and examples of network routing anomalies.
E N D
Provenance Maintenance and Querying on Log-structured Databases A data-centric platform for analyzing distributed systems
Route r2 An Example Scenario Route r1 Why did my route to foo.com change?! D E A foo.com Innocent Reason? Software Bugs? Alice C B Malicious Attack? • An example scenario: network routing • The route to foo.com has suddenly changed • Alice wants to understand the exact cause
Anomalies in Distributed Systems • For network routing … • “YouTube blames Pakistan ISP for global outage” (Feb 2008) • “A Chinese ISP momentarily hijacks the Internet” (March 2010) • “Unknown fault darkens Australia’s Internet” (Feb 2012) • … but also for other application scenarios • Distributed hash table: Eclipse attack • Cloud computing: misbehaving machines • Online multi-player gaming: cheating • Goal: To understand and debug behavior of distributed systems 3
A Data-centric Perspective foo.com Alice D E route(A, foo.com) route(A, B) A link(A, B) route(B, foo.com) route(A, D) link(A, B) route(C, foo.com) …… link(A, D) B C link(C, foo.com) link(B, C) • We assume a general distributed system • Network consists of nodes (routers, middleboxes, ...) • The state of a node is a set of tuples (routes, config, ...) • Idea: Explanation as reasoning of state dependencies
Provenance as Explanations foo.com Alice route(D, foo.com) route(E, foo.com) D E link(D, E) link(E, B) route(A, foo.com) A link(A, B) route(B, foo.com) route(C, foo.com) B C link(C, foo.com) link(B, C) • Provenance for encoding state dependencies • Explains the derivation of tuples • Captures the dependencies between tuples as a graph • Explanation of a tuple is a tree rooted at the tuple • Route r1 disappeared due to a link failure between B and C
Proposal: Provenance Maintenance • In traditional database systems • Provenance deltasbetween adjacent system state • Logs of all non-deterministic events • Replay the events to reconstruct provenance • Problem: storage overhead • In log-structured databasesystems • Only maintain logs of events, but not the latest system state • Natural for provenance support (with no additional cost) • Example: Hyder [CIDR 2011]build upon SSDs for web services
Proposal: Provenance Querying route(A,B,3) route(A,B,7) route(A,B,5) • An efficient data structure for provenance querying • Backward pointers to most recent update to the same state • Chained pointers for reconstructing the a specific state • Optimization of provenance querying • Naïve approach: reconstruct the complete provenance graph • Optimization: only reconstruct the provenance as necessary
Project Arrangement • Project plan • Develop the provenance system on log-structure databases • Evaluate the provenance system against several applications • Performance impact on primary system • Performance (latency) of provenance queries • Budget • Time frame • System development: 6 months • Performance evaluation: 3 months • Cost: $75,000