230 likes | 387 Views
Distributed Quota Enforcement for Spam Control. Jee Whan Choi Chaoting Xuan. Contents. Introduction Distributed Quota Enforcement (DQE) DQE Architecture Enforcer Design Evaluation Conclusions. Introduction. SPAM Unsolicited Bulk Email 50-70% of email today is SPAM SPAM Filters
E N D
Distributed Quota Enforcement for Spam Control Jee Whan Choi Chaoting Xuan
Contents • Introduction • Distributed Quota Enforcement (DQE) • DQE Architecture • Enforcer Design • Evaluation • Conclusions
Introduction • SPAM • Unsolicited Bulk Email • 50-70% of email today is SPAM • SPAM Filters • Email text scanning • Rate of false positive is approximately 1% • Economic damage estimated at 100’s of millions of dollars • Distributed Quota Enforcement (DQE) • Quotas on the # of mails a sender can send
Distributed Quota Enforcement • Design Objectives • Protocol • No False Positives • Untrusted Enforcer • Privacy • Enforcer • Scalability • Fault Tolerance • High Throughput • Attack-Resiliency • Mutually Untrusting Nodes
Quota Allocation and Creation • Quota Allocation • Quota allocated by select few globally trusted quota allocators (QA) • Cs = { Spub, expiration time, quota }QApriv • Stamp • Created by the sender • Stamp = { Cs, {i,t}Spriv }
Protocol Objectives • False Positives • Hash is unique and one way • Untrusted Enforcer • Returns a proof of reuse (fingerprint) • Privacy • Hash of the stamp is used instead of the stamp itself • An adversary cannot cancel a victim’s stamp before it is created • Stamp contains Sender’s private key
Enforcer • Comprises of thousands of untrusted storage nodes • Enforcer stores the fingerprints of stamps cancelled in the current and previous epochs • List of approved nodes are published by a trusted authority (Bunker) • Node receiving the client’s request is called the portal for that request • A client can discover a portal via hard-coding or DNS
TEST • TEST • Local check • If not found, sequentially send request to other nodes (assigned-nodes) • Assigned-nodes are determined by k and r independent hash functions, similar to Chord. • r is configurable system parameter • If any node contains k’s value, return it, otherwise return “not found”
SET • SET • Local store • Also store the value in a randomly chosen node from assigned-nodes
Stamp Reuse and Fault Tolerance • False negative is possible. • Byzantine faults and crash faults are the same • Outcome of adversarial nodes giving false negatives (not-found response) are the same a nodes not responding (crash fault) • Depends on the parameters r and p • p – fraction of n total machines that fail during a 2 day cycle • Expected number of times a stamp is used before stamp’s fingerprint has been placed on a good node - 1/(1-2p)+pr*n • If we assume r = 1+log1/pn, use = 1+3p = 1.3 for p = 0.1
Improvement of Fault Tolerance (our speculation) • Randomly chose two or more nodes from the assigned nodes to store the (key, value) pair in the PUT algorithm. • Increase the overall storage usage, but significantly improve the stamp reuse detection rate.
GET and PUT (Continue) • PUTs are fast • Crash recovery of previously cancelled keys • Key-value pairs are small in size • “Not Found” answers are almost always fast • “Found” answers are slow
Avoiding Distributed Livelock • Distributed Pipeline: 1. TEST/SET requests from clients. 2. GET/PUT requests from other enforcer nodes. 3. GET/PUT responses. • Drop the beginning of a pipeline to maximize throughput.
Resource Exhaustion Attacks • Attacks: flood of spurious TEST/SET requests. • Assumption: Attackers (or zombies they control) have some bandwidth limit. • Solution: Max out attackers’ bandwith by requiring large size or multiple copies of TEST/SET packets .
Performance Evaluation (Continue) • Enforcer Size 1. 100 billion emails daily 2. 65% spam 3. 65 billion disk seeks / day (pessimistic) 4. 400 disk seeks/second/node 5. 86400 seconds/day 1881 nodes (3GHz CPU, 1G RAM, 3 Mbits/sec Bandwith)