410 likes | 985 Views
Problem Warping and Computational Dynamics in the Solution of NP-hard Problems. John A Clark Dept. of Computer Science University of York, UK jac@cs.york.ac.uk 26.07.2001. Overview. Overview of Hill-Climbing and Simulated Annealing Breaking Permuted Perceptron Problem previous work
E N D
Problem Warping and Computational Dynamics in the Solution of NP-hard Problems John A ClarkDept. of Computer Science University of York, UKjac@cs.york.ac.uk 26.07.2001
Overview • Overview of Hill-Climbing and Simulated Annealing • Breaking Permuted Perceptron Problem • previous work • problem warping • timing analysis • solution family based attacks • quantum computing • Speculation
x0 x1 x2 x3 Local Optimisation - Hill Climbing z(x) Really want toobtain xopt Neighbourhood of a point x might be N(x)={x+1,x-1}Hill-climb goes x0 x1 x2 since f(x0)<f(x1)<f(x2) > f(x3) and gets stuck at x2 (local optimum) xopt
x0 x1 x2 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 Simulated Annealing Allows non-improving moves so that it is possible to go down z(x) in order to rise again to reach globaloptimum x In practice neighbourhood may be very large and trial neighbour is chosen randomly. Possible to accept worsening move when improving ones exist.
Simulated Annealing • Improving moves always accepted • Non-improvingmoves may be accepted probabilistically and in a manner depending on the temperature parameter T. Loosely • the worse the move the lesslikely it is to be accepted • a worsening move is less likely to be accepted the cooler the temperature • The temperature T starts high and is gradually cooled as the search progresses. • Initially virtually anything is accepted, at the end only improving moves are allowed (and the search effectively reduces to hill-climbing)
Simulated Annealing • Current candidate x. Minimisation formulation. At each temperature consider 400 moves Always accept improving moves Temperature cycle Accept worsening moves probabilistically. Gets harder to do this the worse the move. Gets harder as Temp decreases.
Simulated Annealing Do 400 trial moves Do 400 trial moves Do 400 trial moves Do 400 trial moves Do 400 trial moves Do 400 trial moves
Identification Problems • Notion of zero-knowledge introduced by Goldwasser and Micali (1985) • Indicate that you have a secret without revealing it • Early scheme by Shamir • Several schemes of late based on NP-complete problems • Permuted Kernel Problem (Shamir) • Syndrome Decoding (Stern) • Constrained Linear Equations (Stern) • Permuted Perceptron Problem(Pointcheval)
Given Find So That Pointcheval’s Perceptron Schemes • Interactive identification protocols based on NP-complete problem. • Perceptron Problem.
Given Find So That Has particular histogram H of positive values 1 3 5 .. .. .. Pointcheval’s Perceptron Schemes • Permuted Perceptron Problem (PPP). Make Problem harder by imposing extra constraint.
1 3 5 Example: Pointcheval’s Scheme • PP and PPP-example • Every PPP solution is a PP solution. Has particular histogram H of positive values
Generate random matrix A • Generate random secret S • Calculate AS • If any (AS)i <0 then negate ith row of A Generating Instances • Suggested method of generation: Significant structure in this problem; high correlation between majority values of matrix columns and secret corresponding secret bits
Image elements tend to be small 1 3 5 7… Instance Properties • Each matrix row/secret dot product is the sum of n Bernouilli (+1/-1) variables. • Initial image histogram has Binomial shape and is symmetric about 0 • After negation simply folds over to be positive -7–5-3-1 1 3 5 7…
Neighbourhood defined by single bit flips on current solution Cost function punishes any negative image components costNeg(y)=|-1|+|-3| =4 current solution Y PP Using Search: Pointcheval • Pointcheval couched the Perceptron Problem as a search problem.
Using Annealing: Pointcheval • PPP solution is also PP solution. • Based estimates of cracking PPP on ratio of PP solutions to PPP solutions. • Calculated sizes of matrix for which this should be most difficult • Gave rise to (m,n)=(m,m+16) • Recommended (m,n)=(101,117),(131,147),(151,167) • Gave estimates for number of years needed to solve PPP using annealing as PP solution means • PP instances with matrices of size 200 ‘could usually be solved within a day’ • But no PPP problem instance greater than 71 was ever solved this way ‘despite months of computation’.
Perceptron Problem (PP) • Knudsen and Meier approach (loosely): • Carrying out sets of runs • Note where results obtained all agree • Fix those elements where there is complete agreement and carry out new set of runs and so on. • If repeated runs give same values for particular bits assumption is that those bits are actually set correctly • Used this sort of approach to solve instances of PP problem up to 180 times faster than Pointcheval for (151,167) problem but no upper bound given on sizes achievable.
Actual Secret Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 All runs agree All agree (wrongly) Profiling Annealing • Approach is not without its problems. • Not all bits that have complete agreement are correct. 1 -1
Knudsen and Meier • Have used this method to attack PPP problem sizes (101,117) • Needs hefty enumeration stage (to search for wrong bits), allowed up to 264 search complexity • Used new cost function w1=30, w2=1 with histogram punishment cost(y)=w1costNeg(y)+w2costHist(y)
Why Don’t They Work Better? • What limits the ability of annealing to find a PP solution?
PP Move Effects • A move changes a single element of the current solution. • Want current negative image values to go positive • But changing a bit to cause negative values to go positive will often cause small positive values to go negative. 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Problem Warping • Can significantly improve results by punishing at positive value K • For example punish any value less than K=4 during the search • Drags the elements away from the boundary during search. • Also use square of differences |Wi-K|2 rather than simple deviation 7 6 5 4 3 2 1 0 Cost=|4- -1|2=25
Problem Warping PP (201,217) (401,417) (501,517) (601,617) Table gives numbers of success in 30 runs of annealing followed by 0,1,2,3 bit hill-climb for each of 10 problems.
Problem Warping • Comparative results • Generally allows solution within a few runs of annealing for sizes (201,217) • Number of bits correct is generally worst when K=0. • Best value for K varies between sizes (but can do profiling to test what it is) • Has proved possible to solve for size (601,617) and higher. • Enormous increase in power for essentially change to one line of the program • Using powers of 2 rather than just modulus • Use of K factor • Morals… • Small changes may make a big difference. • The real issue is how the cost function and the search technique interact • The cost function need not be the most `natural’ direct expression of the problem to be solved. • Cost functions are a means to an end. • This is a form of fault injection or problem warping on the problem.
Some Tricks • Won’t go into detail but there are some further problem specific tricks that can be used to reduce the remaining search. • For example, you can generally tell easily whether you have an odd or even number of bits wrong. • Sum the image elements taking values of … -7,-3,1,5,9,13.. (S1) • Sum the image elements taking values of … -5,-1,3,7, 11.. (S2) • Find the corresponding sums T1, T2 in the provided histogram • If T1=S1 and T2=S2 then there are an even number of bits wrong • If T1=S2 and T2=S1 then there are an odd number wrong
A Few Tricks More • Look at the image elements wi produced. • If I knew what they should be I could use linear algebra to solve the system. • I do not know whether they are right or not – but often they are, or nearly so. • If wi=1 is obtained by some run. It is very likely that the actual value it should be is 1,5,9 (assuming an even number of bits wrong). • Assume it is correct. Then changing any bits of the current solution to obtain the original solution must not change the value of wi • This means half the bits xj I change in the solution x must agree in sign with corresponding bit aij in the ith row (and half must disagree). This reduces the complexity of the remaining search.
Overall • Have missed out the details but basically this scheme is broken. • There is just two much structure….and there is more
Radical Viewpoint Analysis Problem P Problem P1 Problem P2 Problem Pn-1 Problem Pn Essentially create mutant problems and attempt to solve them. If the solutions agree on particular elements then they generally will do so for a reason, generally because they are correct. Can think of mutation as an attempt to blow the search away from actual original solution. Look for agreement between solutions. Often nearly half the key can be obtained without any wrong bits.
Radical Viewpoint Analysis Bits where three runs agree. Go for unanimity. A more stressful variation of Knudsen and Meier’s idea
Democratic Viewpoint Analysis Problem P Problem P1 Problem P2 Problem Pn-1 Problem Pn It’s a 1 It’s a 1 It’s a 1 No. It’s a -1 Essentially same as before but this time go for substantial rather than unanimous agreement. By choosing the amount of disagreement tolerated carefully you can sometimes get over half the key this way. And on occasion have had only 1 bit in 115 most agreed bits incorrect (out of 167)
Multiple Clock Watchers Analysis Problem P Problem P1 Problem P2 Problem Pn-1 Problem Pn Essentially same as for timing analysis but this time add up the times over all runs where each bit got stuck. As you might expect those bits that often get stuck early (i.e. have low aggregate times to getting stuck) generally do so at their correct values (take the majority value). Also seems to have significant potential but needs more work.
Quantum Computation • Everything I have reported so far has assumes the classical computational paradigm. • But this is the very assumption that gave rise to the biggest shock in cryptography. • Let’s not fall into the same trap. • Can heuristic search and quantum computing work together?
Grover’s Algorithm • Consider a function f(x) : • x is in 0..(2N-1) • there is a single value v such that some predicate P(v) holds. • Then Grover’s algorithm can find v in approximately O(2(N/2)) steps. • Thus if we have a state space of size 2100, it will require O(250) steps • Now let us return to the (101,117) PPP case. • Finding a solution to this by quantum search would require O(259) steps. • But if we can obtain a solution with 108 bits correct, we could ask a different question. • What are the indices of the 9 wrong bits? • Assuming each index can be couched in 7 bits, we have 7*9=63 bits • This means that Grover’s can find the answer in O(232)
More Short Term • Can we view metaheuristic search as a means of problem reduction rather than problem solving? • The AI community has developed methods that work very well with very highly constrained problems. • Am currently experimenting with profiling and using properties of how near search gets to the goal to place bounds on the remaining problem and solve using linear programming.
Grover’s Algorithm 2 • And it’s not all one way. If there are more states satisfying a predicate one might expect the task of finding one of them to be easier than previously. • Indeed if there are M states v satisfying the predicate P(v) then the search becomes of order • And so characterise positions from which you can use heuristic searcheffectively and use QC to find them. Then use HS to reach optimium Now hill-climb to get here Use QC to get in this range
Speculation and Further work • Can we try failing millions of times and then start doing cryptanalysis on the results? • Will the techniques work more widely? • Why cannot I break say DES or RSA using a technique like this? • Is there a theorem to suggest not? No. • Cryptography of block ciphers largely works by approximations, e.g. functions of the form • P[3].xor.P[35].xor.K[1].xor.K[22].xor.C[15].xor.C[52]are true with some bias (e.g. 50.00001% of the time) • P[j] =bit j of a plaintext block, similarly C is ciphertext and K is key. • Can we derive these from sample data using annealing? • How can we exploit the notion of shifting computational paradigm? • How well can we profile the distribution of results in order to isolate those ones at the extremes of correctness?
Speculation and Further work • Very few applications of these techniques to modern day cryptography and its applications. • Have successfully created Boolean functions with desirable cryptographic properties. • Have also evolved evolved protocols in belief logics whose abstract execution is a proof of their own correctness. • Much more to come.