380 likes | 562 Views
The Complexity of Pebbling Graphs and Spam Fighting. Moni Naor WEIZMANN INSTITUTE OF SCIENCE. Based on:. Cynthia Dwork, Andrew Goldberg, N: On Memory-Bound Functions for Fighting Spam . Cynthia Dwork, N, Hoeteck Wee: Pebbling and Proofs of Work. Principal techniques for spam-fighting.
E N D
The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTEOF SCIENCE
Based on: • Cynthia Dwork, Andrew Goldberg, N: On Memory-Bound Functions for Fighting Spam. • Cynthia Dwork, N, Hoeteck Wee: Pebbling and Proofs of Work
Principal techniques for spam-fighting • FILTERING • text-based, trainable filters … • MAKING SENDER PAY • computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW05] • human attention [Naor 96, Captcha] • micropayments NOTEtechniques are complementary: reinforce each other!
Principal techniques for spam-fighting • FILTERING • text-based, trainable filters … • MAKING SENDER PAY • computation[Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW 05] • human attention [Naor 96, Captcha] • micropayments NOTEtechniques are complementary: reinforce each other!
Talk Plan • The proofs of work approach • DGN’s Memory bound functions • Generating a large random looking table [DNW] • Open problems: moderately hard functions
sender S recipient R message m, time d + proof = f(m,S,R,d) moderately hard to compute easy to verify Pricing via processing [Dwork-Naor Crypto 92] • automated for the user • non-interactive, single-pass • no need for third party or payment infrastructure IDEA If I don’t know you: prove you spent significant computational resources (say 10 secs CPU time), just for me, and just for this message
Choosing the function f Message m, Sender S, Receiver R and Date and time d • Hard to compute; f(m,S,R,d) - cannot be amortized • lots of work for the sender • Should have good understanding of best methods for computing f • Easy to check”z = f(m,S,R,d)” - little work for receiver • Parameterized to scale with Moore's Law • easy to exponentially increase computational cost, while barely increasing checking cost Example: computing a square root mod a prime vs. verifying it; x2 =y mod P
Which computational resource(s)? WANT corresponds to the same computation time across machines • computing cycles • high variance of CPU speeds within desktops • factors of 10-30 • memory-bound approach [Abadi Burrows Manasse Wobber 03] • low variance in memory lantencies • factors of 1-4 GOAL design a memory-bound proof of effort function which requires a large number of cache misses
USER SPAMMER • CACHE • small but fast • CACHE • cache size at most ½ user’s main memory • MAIN MEMORY • large but slow • MAIN MEMORY • may be very very large • may exploit locality memory-bound model
memory-bound model USER SPAMMER • CACHE • small but fast • CACHE • cache size at most ½ user’s main memory • charge accesses to main memory • must avoid exploitation of locality • computation is free • except for hash function calls • watch out for low-space crypto attacks • MAIN MEMORY • large but slow • MAIN MEMORY • may be very very large
Talk Plan • The proofs of work approach • DGN’s Memory bound functions • Generating a large random looking table [DNW] • Open problems: moderately hard functions
table T Path-following approach [DGN Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T • sender searches collection of paths to find a good path • collection depends on (m, S, R, d) • density of good paths = 1/2e • locations in T depends on hash functions H0,…,H3
table T Path-following approach [DGN Crypto 03] PUBLIC large random table T (2 x spammer’s cache size) PARAMETERS integer L, effort parameter e IDEA path is a sequence of L sequential accesses to T • sender searches collection of paths to find a good path OUTPUT (m, S, R, d) + description of a good path COMPLEXITY sending: O(2eL) memory accesses; verifying: O(L) accesses
Successful Path Collection P of paths. Depends on (m,S,R,d) L
Abstracted Algorithm Sender and Receiver share large random Table T. To send message m, Sender S, Receiver R date/time d, Repeat trial for k = 1,2, … until success: Current state specified by A auxiliary table Thread defined by (m,S,R,d,k) • Initialization:A = H0(m,S,R,d,k) • Main Loop: Walk for L steps (L=path length): c = H1(A) A = H2(A,T[c]) • Success: if lastebit of H3(A) = 00…0 Attach to (m,S,R,d) the successful trial number k and H3(A) Verification: straightforward given (m, S, R, d, k,H3 (A))
H1 H2 Animated Algorithm – a Single Step in the Loop A C C = H1(A) A = H2(A,T[C]) T T[C]
Full Specification E = (expected) factor by which computation cost exceeds verification = expected number of trials = 2e If H3 behaves as a random function L = length of walk Want, say, ELt = 10 seconds, where t = memory latency = 0.2 sec Reasonable choices: E = 24,000, L = 2048 Also need: How large is A? A should not be very small… abstract algorithm • Initialize: A = H0(m,S,R,d,k) • Main Loop: Walk for L steps: • c H1(A) • A H2(A,T[c]) • Success if H3(A) = 0log E • Trial repeated for k = 1,2, … • Proof = (m,S,R,d,k,H3(A))
Choosing the H’s A “theoretical” approach: idealized random functions • Provide a formal analysis showing that the amortized number of memory access is high A concrete approach inspired by RC4 stream cipher • Very Efficient: a few cycles per step • Don’t have time inside inner loop to compute complex function • A is not small – changes gradually • Experimental Results across different machines
Path-following approach [Dwork-Goldberg-Naor Crypto 03] [Remarks] • lower bound holds for spammer maximizing throughput across any collection of messages and recipients • model idealized hash functions using random oracles • relies on information-theoretic unpredictability of T • [Theorem] fix any spammer: • whose cache size is smaller than |T|/2 • assuming T is truly random • assuming H0,…,H3 are idealized hash functions • the amortized number of memory accesses per successful message is (2eL).
Why Random Oracles? Random Oracles 101 Can measure progress: • know which oracle calls must be made • can see when they occur. First occurrence of each such call is a progress call: 1 2 31 3 2 34… • Initialize: A = H0(m,S,R,d,k) • Main Loop: Walk for L steps: • c H1(A) • A H2(A,T[c]) • Success if H3(A) = 0log E • Trial repeated for k = 1,2, … • Proof = (m,S,R,d,k,H3(A)) abstract algorithm
Proof highlights Use of idealized hash function implies: • At any point in time A is incompressible • The average number of oracle calls per success is(EL). • We can follow the progress of the algorithm Cast the problem as that of asymmetric communication complexity between memory and cache • Only the cache has access to the functions H1 and H2 Cache Memory
Talk Plan • The proofs of work approach • DGN’s Memory bound functions • Generating a large random looking table [DNW] • Open problems
Using a succinct table [DNW 05] GOAL use a table T with a succinct description • easy distribution of software (new users) • fast updates (over slow connections) PROBLEM lose information theoretic unpredictability • spammer can exploit succinct description to avoid memory accesses IDEA generate T using a memory-bound process • Use time-space trade-offs for pebbling • Studied extensively in 1970s User builds the table T once and for all
Pebbling a graph GIVEN a directed acyclic graph RULES: • inputs: a pebble can be placed on an input node at any time • a pebble can be placed on any non-input vertex if allimmediate parent nodes have pebbles • pebbles may be removed at any time GOAL find a strategy to pebble all the outputs while using few pebbles and few moves INPUT OUTPUT
What do we know about pebbling • Any graph can be pebbled using O(N/log N) pebbles. [Valiant] There are graphs requiring (N/log N) pebbles [PTC] • Any graph of depthd can be pebbled using O(d) pebbles • Constant degree Tight tradeoffs: some shallow graphs requires many (super poly) steps to pebble with a few pebbles [LT] • Some results about pebbling outputs hold even when possible to put the available pebbles in any initial configuration
Succinctly generating T GIVEN a directed acyclic graph • constant in-degree • input node i labeled H4(i) • non-input node i labeledH4(i, labels of parent nodes) • entries of T =labels of output nodes OBSERVATION good pebbling strategy ) good spammer strategy Lj Lk Li = H4(i, Lj, Lk) INPUT OUTPUT
Converting spammer strategy to a pebbling EX POST FACTO PEBBLING computed by offline inspection of spammer strategy • PLACING A PEBBLE place a pebble on node i if • H4 used to compute Li = H4(i, Lj, Lk), and • Lj, Lk are the correct labels • INITIAL PEBBLES place initial pebble on node j if • H4 applied with Lj as argument, and • Lj not computed via H4 • REMOVING A PEBBLE remove a pebble as soon as it’s not needed anymore • computing a label using hash function • lower bound on # moves )lower bound on # hash function calls • using cache + memory fetches • lower bound on # pebbles )lower bound on # memory accesses IDEA limit # of pebbles used by the spammer as a function of its cache size and # of bits it brings from memory
CONSTRUCTION dag D composed of D1 & D2 D1 has the property that pebbling many outputs requires many pebbles more than cache and pages brought from memory can supply stack of superconcentrators[Lengauer Tarjan 82] D2 is a fault-tolerant layered graph even if a constant fraction of each layer is deleted – can still embed a superconcentrator stack of expanders[Alon Chung 88, Upfal 92] Constructing the dag inputs of D D1 D2 • SUPERCONCENTRATOR is a dag • N inputs, N outputs • any k inputs and k outputs connected by vertex-disjoint paths outputs of D
CONSTRUCTION dag D composed of D1 & D2 D1 has the property that pebbling many outputs requires many pebbles more than cache and pages brought from memory can supply stack of superconcentrators[Lengauer Tarjan 82] D2 is a fault-tolerant layered graph even if a constant fraction of each layer is deleted – can still embed a superconcentrator stack of expanders [Alon Chung 88, Upfal 92] Using the dag • [idea] fix any execution: • let S = set of mid-level nodes pebbled • if S is large, use time-space trade-offs for D1 • if S is small, use fault-tolerant property of D2 : • delete nodes whose labels are largely determined by S
The lower bound result [Remarks] • lower bound holds for spammer maximizing throughput across any collection of messages and recipients • model idealized hash functions using random oracles • [Theorem] for the dag D, fix any spammer: • whose cache size is smaller than |T|/2 • assuming H0,…,H4 are idealized hash functions • makes poly # of hash function calls • the amortized number of memory accesses per successful message is (2e L).
What can we conclude from the lower bound? • Shows that the design principles are sound • Gives us a plausibility argument • Tells us that if something will go wrong we will know where to look But • Based on idealized random functions • How to implement them • Might be computationally expensive • Are applied to all of A • Might be computationally expensive simply to “touch” all of
Talk Plan • The proofs of work approach • DGN’s Memory bound functions • Generating a large random looking table [DNW] • Open problems: moderately hard functions
Alternative construction based on sorting • motivated by time-space trade-offs for sorting [Borodin Cook 82] • easier to implement T[i] = H4(i, 1) • input node i labeled H4(i, 1) • at each round, sort array • then apply H4 to current values of the array SORT T[i] = H4(i, T[i], 2) SORT OPEN PROBLEM prove a lower bound …
More open problems WEAKER ASSUMPTIONS? no recourse to random oracles • use lower bounds for cell probe model and branching programs? • Unlike most of cryptography – in this case there is a chance of coming up with an unconditional result Physical limitations of computation to form a reasonable lower bound on the spammers effort
A theory of moderately hard function? • Key idea in cryptography: use the computational infeasibilityof problems in order to obtain security. • For many applications moderate hardness is needed • current applications: • abuse prevention, fairness, few round zero-knowledge FURTHER WORK develop a theory of moderate hard functions
Open problems: moderately hard functions • Unifying Assumption • In the intractable world: one-way function necessary and sufficient for many tasks • Is there a similar primitive when moderate hardness is needed? • Precise model • Details of the computational model may matter, unifying it? • Hardness Amplification • Start with a somewhat hard problem and turn it into one that is harder. • Hardness vs. Randomness • Can we turn moderate hardness into and moderate pseudorandomness? • Following standard transformation is not necessarily applicable here • Evidence for non-amortization • It possible to demonstrate that if a certain problem is not resilient to amortization, then a single instance can be solved much more quickly?
Open problems: moderately hard functions • Immunity to Parallel Attacks • Important for timed-commitments • For the power function was used, is there a good argument to show immunity against parallel attacks? • Is it possible to reduce worst-case to average case: • find a random self reduction. • In the intractable world it is known that there are limitations on random self reductions from NP-Complete problems • Is it possible to randomly reduce a P-Complete problem to itself? • is it possible to use linear programming or lattice basis reduction for such purposes? • New Candidates for Moderately Hard Functions
Thank you Merci beaucoup תודה רבה