1 / 60

Trustworthy Distributed Search and Retrieval over the Internet

Trustworthy Distributed Search and Retrieval over the Internet. Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith, Chair Professor Louise E. Moser Professor Timothy P. Sherwood

Download Presentation

Trustworthy Distributed Search and Retrieval over the Internet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Trustworthy Distributed Search and Retrieval over the Internet Yung-Ting Chuang Electrical and Computer Engineering University of California, Santa Barbara May 3, 2013 Committee Members: Professor P. Michael Melliar-Smith, Chair Professor Louise E. Moser Professor Timothy P. Sherwood Professor Volkan Rodoplu Yung-Ting Chuang's Ph.D. Defense

  2. Outline • Motivation • Trustworthy Distributed Search and Retrieval • Protecting against Malicious Attacks in iTrust • Membership Management for iTrust • Statistical Inference and Dynamic Adaptation for iTrust • Conclusions and Future Work Yung-Ting Chuang's Ph.D. Defense

  3. Motivation • Information is accessed over the Internet using centralized search engines • Benefits - efficient, robust, and scalable • Drawbacks – depends on administrators remaining benign • Thus, we present a decentralized and distributed search and retrieval system • Benefits – prevent censorship and filtering of information • Drawbacks – • Need more network bandwidth • Difficult to infer membership size and malicious nodes Yung-Ting Chuang's Ph.D. Defense

  4. Trustworthy Distributed Search and Retrieval Related Work Design of iTrust Implementation of iTrust User Interface of iTrust Performance Evaluation of iTrust Summary Yung-Ting Chuang's Ph.D. Defense

  5. 1. Related Work • Survey by Mischeke and Risson on distributed search: • Structured – Require nodes to be organized in an overlay network • Distributed Hash Table (DHT), Ring, Tree, Skip Lists • Unstructured – Typically gossip-based, and use randomization • Flooding / Broadcast => Gnutella • Random walk and data replication => Sarshar, GIA, Lv • Key-based routing => Freenet • Direct routing => Pub-2-Sub • Square root function => Cohen, Zhong, Ferreira • P2P systems concerned with security, privacy, and trust • Quasar–Uses a structured overlay and protects user’s sensitive information • OneSwarm– Uses a combination of trusted and untrusted peers and protect the privacy of the users • GOSSPLE – Fully decentralized system for social acquaintances using a gossip protocol. Yung-Ting Chuang's Ph.D. Defense

  6. Source of Information 2. Design of iTrusta) Distribution of Metadata Yung-Ting Chuang's Ph.D. Defense

  7. Source of Information Request Encounters Metadata Requester of Information 2. Design of iTrust b) Distribution of a Request Yung-Ting Chuang's Ph.D. Defense

  8. Source of Information Requester of Information 2. Design of iTrust c) Retrieval of Information Request Matched Yung-Ting Chuang's Ph.D. Defense

  9. 3. Implementation of the iTrust System Yung-Ting Chuang's Ph.D. Defense

  10. 4. User Interface of iTrust Yung-Ting Chuang's Ph.D. Defense

  11. 4. User Interface of iTrust Yung-Ting Chuang's Ph.D. Defense

  12. 5. Performance Evaluation of iTrusta) Analytical Model • Notation • Membership contains n participating nodes • x is the proportion of participating nodes that are operational • Metadata are distributed to m nodes • Requests are distributed to r nodes • k nodes report matches to a requesting node (for the same metadata and the same request) Yung-Ting Chuang's Ph.D. Defense

  13. 5. Performance Evaluation of iTrusta) Analytical Model • Probability of k matches is: • Probability of one or more match is: Yung-Ting Chuang's Ph.D. Defense

  14. 5. Performance Evaluation of iTrusta) Analytical Model Yung-Ting Chuang's Ph.D. Defense

  15. 5. Performance Evaluation of iTrustb) Analysis vs. Emulation Yung-Ting Chuang's Ph.D. Defense

  16. 6. Summary • Problem we are trying to solve: • Centralized search engines can be tampered with to bias the results, or to conceal or censor information • Our solutions and contributions: • We have implemented iTrust, which is a decentralized distributed search and retrieval system with no centralized mechanisms and no centralized control • We have demonstrated that the match probability is high, even if some participating nodes are subverted or non-operational Yung-Ting Chuang's Ph.D. Defense

  17. Protecting against Malicious Attacks in iTrust Background Related Work Foundations Detecting Malicious Attacks Defending against Malicious Attacks Performance Evaluation Summary Yung-Ting Chuang's Ph.D. Defense

  18. 1. Background • Potential attacks: • Nodes do not match requests • Nodes do not return responses to requester • Effect of such attacks • Probability of a match is decreased • Existing work that addresses attacks: • Place nodes on a blacklist (Jesi) • Maintains a reputation or trust score (Condie) • Our solution to such attacks is: • Estimate the proportion of malicious nodes • Increase the number of nodes to which requests are distributed in order to restore match probability Yung-Ting Chuang's Ph.D. Defense

  19. 2. Related Work • Work related to our detection algorithm • Exponential Weighted Moving Average (EWMA) • Roberts et al. - For discovering anomalies and issuing alerts • Chi-squared test • Goonatilake - For detecting intrusions • Press et al. - For balancing weights of buckets • Belen and Heckert – For determining similarity between two models • EWMA and Chi-squared test • Ye and Chen - For anomaly detection and intrusion detection • Work related to our defensive adaptation algorithm: • Morselli – Uses feedback mechanism to adjust the replicas to improve search result • Leng – Uses maintainer to determine, update, and eliminate the data replicas Yung-Ting Chuang's Ph.D. Defense

  20. 3. Foundationsa) Normalization • We cannot use requests that return k=0 responses • Because there might be no metadata to match • Probability of k matches is negligibly small, when k is large • Thus, we exclude requests for k=0 and for k > K • Our normalization equation is: • where Yung-Ting Chuang's Ph.D. Defense

  21. 3. Foundationsb) Exponential Weighted Moving Average • The EWMA method is computed as follows: where c is the weighting factor for the EWMA method Yung-Ting Chuang's Ph.D. Defense

  22. 3. Foundationsc) Chi-Squaredvs. Modified Chi-Squared • Pearson’s chi-squared statistic: • Pearson’s modified chi-squared statistic: where: • ok: the actual number of observations that fall into kth bucket • ek: the expected number of observations for the kth bucket • K: the number of buckets into which the observations fall Yung-Ting Chuang's Ph.D. Defense

  23. 3. Foundationsd) Chi-Squared vs Modified Chi-Squared Yung-Ting Chuang's Ph.D. Defense

  24. 4. Detecting Malicious Attacksa) Detection Algorithm • Collects responses for its request using EWMA method • Normalize empirical probabilities • Uses modified chi-squared test to compare the empirical probabilities against the analytical probabilities for x=1.0, 0.7, 0.4, and 0.2 • Chooses the smallest value of chi-squared to estimate x’ Yung-Ting Chuang's Ph.D. Defense

  25. 4. Detecting Malicious Attacksb) Example Yung-Ting Chuang's Ph.D. Defense

  26. 5. Defending against Malicious Attacksa) Defensive Adaptation Algorithm • Initialize r  0 • Calculate yo based on current r with given n, m, and x. • Determine whether the yo is greater than the expected match probability. • If not, increase r by 1 and go back to step 2 • If so, return r Yung-Ting Chuang's Ph.D. Defense

  27. 5. Defending against Malicious Attacksb) Example Yung-Ting Chuang's Ph.D. Defense

  28. 6. Performance Evaluationa) Varying the number of nodes Yung-Ting Chuang's Ph.D. Defense

  29. 6. Performance Evaluation Yung-Ting Chuang's Ph.D. Defense

  30. 7. Summary • Problem we are trying to solve in this chapter: • Absence of centralized control makes it difficult to determine the proportion of non-operational nodes in the network • Our solution and contributions: • A node can estimate the proportion of non-operational nodes in the network based on the responses to its requests • A node calculates the number of nodes to which the requests are distributed to maintain a high match probability • A node infers useful but unobservable information about the network as a whole by observing aspects of the behaviors of individual nodes that are visible to it Yung-Ting Chuang's Ph.D. Defense

  31. Membership Management for iTrust Background Related Work iTrust Membership Protocols Foundations Performance Evaluation Extended Scenario Summary Yung-Ting Chuang's Ph.D. Defense

  32. 1. Background • Churn – Nodes joining and leaving the membership • Challenging tasks • Estimating membership and membership size • Estimating churn • Existing work that addresses churn: • Passive Monitoring (Sen et al., Gummadi et al.) • Active Probing (Chu et al., Liang, Bhagwan et al.) • Gossiping (Bizenhofer, Pruteanuet al) • Our approach to address churn: • Nodes don’t predict churn characteristics in advance • Each node maintains its local view of the membership and uses statistical inference to update its view Yung-Ting Chuang's Ph.D. Defense

  33. 2. Related Work • Work related to membership management: • Zage – Biases neighbor selections toward beneficial nodes • SCAMP – Nodes discover joining and leaving nodes through gossiping • CYCLON – Nodes maintain a small and fixed-size neighbor list, with a shuffling protocol for large networks • Newcast – Each node periodically selects a peer to exchange and update its membership list • Work related to churn: • Bizenhofer and Pruteanu et al. - Estimate the churn rate through gossiping • Stutzbach & Rejaie - Study churn characteristics, highlight problems that cause biased peer selections. • Paulo et al. – Maintains dynamic mapping of flows according to the current set of neighbors • Liu – Presents an age-based membership protocol with a conservative neighbor maintenance scheme under churn • Horowitz et al. – Relies on the departure and arrival of nodes to estimate the current network size, without requiring any additional communication Yung-Ting Chuang's Ph.D. Defense

  34. Joining Node 3. iTrust Membership Protocolsa) Joining the Membership Bootstrapping Node Yung-Ting Chuang's Ph.D. Defense

  35. 3. iTrust Membership Protocolsb) Leaving the Membership Leaving Node Yung-Ting Chuang's Ph.D. Defense

  36. 3. iTrust Membership Protocolsc) Distributing Metadata Discover New Node Discover Leaving Node Source Node Yung-Ting Chuang's Ph.D. Defense

  37. 3. iTrust Membership Protocolsd) Distributing Requests Redistribute Metadata Discover Leaving Node Discover New Node Requesting Node Yung-Ting Chuang's Ph.D. Defense

  38. 4. Foundationsa) Metrics • LND: Leaves Not Detected • JND: Joins Not Detected • MA: Membership Accuracy • MP: Match Probability for a request • RT: Response Time required for a request • MC: Message Cost per time unit Yung-Ting Chuang's Ph.D. Defense

  39. 5. Performance Evaluationa) Retry R Membership Protocol • Motivation: • When a node distributes a request message to R nodes, it might detect some leaving nodes. Therefore, it might not receive exactly R responses. • Solution: • We allow a node to keep sending its message to more than R nodes until it receives exactly R responses. • Our input variables for the Retry R Membership Protocol: • Try: The number of times that a requesting node sends its request message in an attempt to receive R responses. • TryMax: The maximum Try value. Yung-Ting Chuang's Ph.D. Defense

  40. 5. Performance Evaluationb) Adaptive RR Membership Protocol • Our Churn Estimator is: where • Left: Number of nodes that were detected as non-operational • Joined: Number of nodes that were discovered have joined • NumNodes: Number of requests that a requesting node sent • The Requesting Rate (RR) is: if CE > RRMin / RRMax then RR RRMax x CE else RR  RRMin Yung-Ting Chuang's Ph.D. Defense

  41. 5. Performance Evaluationc) Message Cost vs. Membership Accuracy ? Yung-Ting Chuang's Ph.D. Defense

  42. 5. Performance Evaluation d) Combined Adaptive Membership • Start infinite loop • if current time reaches nextTime • while Try<=2 and resRec < R • make request to (R-resRec) nodes and get responses array • determine left, joined, N, responded from responses array • resRec = resRec + responded • Try = Try + 1 • CE = (left+joined) / (R + R – resRec) • if CE > 1 / RRMax • RR = RRMax x CE • else • RR = 1 Yung-Ting Chuang's Ph.D. Defense

  43. 5. Performance Evaluation e) Performance Tuning • Combined Adaptive with Try=2, RRMax = 100, 50, 30 Yung-Ting Chuang's Ph.D. Defense

  44. 5. Performance Evaluatione) Message Cost vs. Membership Accuracy Yung-Ting Chuang's Ph.D. Defense

  45. 6. Extended Scenarioa) Combined Adaptive Membership Protocol Yung-Ting Chuang's Ph.D. Defense

  46. 7. Summary • Problem we are trying to solve in this chapter: • We cannot accurately estimate the joining or leaving rates, or maintain an accurate view of the membership when the system has high membership churn • Our solution and contributions: • We presented an adaptive membership management protocol, which uses random sampling to discover newly joining and leaving nodes • Based on the responses it received to its request, a node calculates the churn estimator and dynamically adjusts its requesting rate to update its local view of the membership • Our membership protocol exploits the messages already required by the messaging protocol Yung-Ting Chuang's Ph.D. Defense

  47. Statistical Inference and Dynamic Adaptation for iTrust Background Model for iTrust Dynamic Adaptation Algorithm Performance Evaluation Summary Yung-Ting Chuang's Ph.D. Defense

  48. 1. Background • Problems that co-exist in a fully distributed system • High membership churn • Large proportion of malicious nodes • Our approach to address both problems: • Use random sampling • Apply statistical inference techniques to estimate: • Membership churn with a large proportion of malicious nodes • Proportion of malicious nodes in the presence of high membership churn Yung-Ting Chuang's Ph.D. Defense

  49. 2. Model for iTrusta) System and Fault Model • We consider the following scenarios • A node leaves the membership voluntarily • A node leaves the membership involuntarily • A malicious node responds to a request but it does not report a match • Parameters for membership churn: • JR: Joining Rate • LR: Leaving Rate • Parameters for detecting malicious nodes: • X: Proportion of non-malicious nodes Yung-Ting Chuang's Ph.D. Defense

  50. 3. Dynamic Adaptation Algorithma) Parameters and Variables • n: Size of the node’s current view of the membership • m: Number of nodes to which the metadata are distributed • r: Number of nodes to which the requests are distributed • IE: Intersection estimator obtained by random sampling: • nIE: Estimate of n in I • mIE: Estimate of m in I • rIE: Estimate of r in I • left: Number of nodes that were detected as non-operational • numNodes: Number of requests that a requesting node sent its request nIE mIE rIE Yung-Ting Chuang's Ph.D. Defense

More Related