410 likes | 428 Views
Exploiting the Search Process. John A Clark Dept. of Computer Science University of York, UK jac@cs.york.ac.uk. A Talk with a Title I Cannot Remember but with Content Much as Advertised So Don’t Worry Too Much. John A Clark Dept. of Computer Science University of York, UK
E N D
Exploiting the Search Process John A Clark Dept. of Computer Science University of York, UK jac@cs.york.ac.uk
A Talk with a Title I Cannot Remember but with Content Much as Advertised So Don’t Worry Too Much John A Clark Dept. of Computer Science University of York, UK jac@cs.york.ac.uk
Overview • Some initial motivation. • The cost function matters. • The cost function doesn’t matter. • Profiling optimisation - why it pays to watch paint dry. • Further ideas for heuristic optimisation.
Early Cryptanalysis • Use of optimisation techniques typically uses a costs function that seeks to provide a frequency profile close to that expected from average text… • In every paper I have seen the value of R is 1 – but why? • A hang-over from pencil and paper days? • Does it matter?
The Cost Function Matters • Different cost functions give different ‘results’. • Not very surprising in itself. • Most optimisation work ‘fiddles around’ with various cost functions until one is obtained that works well. • Effectiveness of cost function may depend on search technique used (e.g. genetic algorithms or simulated annealing)
Different Cost Functions Give Different Results I How to Plant A Trapdoor and And Why You Might Not Get Away With It or… Secret Agents Leave Big Footprints
0 0 0 1 -1 0 0 0 1 0 1 1 0 1 0 0 1 2 0 1 1 0 1 3 1 0 0 1 -1 4 1 0 1 0 1 5 1 1 0 1 -1 6 1 1 1 1 -1 7 Boolean Function Design • A Boolean function x f(x) f(x) Can use non-linearityas the cost function. Or minimise new cost function. Cost(f)=S ||F(w)|-(2 n/2+K)| R
Uses and Abuses • Can use optimisation to maximise the non-linearity, minimise autocorrelation elements etc. • These are publicly recognised good properties that we might wish to demonstrate to others. • From an optimisation point of view one way of satisfying these requirements is as good as another. • But for a malicious designer this may not be the case. Who says that optimisation has to be used honestly???? • What’s to stop me creating Boolean functions or S-boxes with good public properties but with hidden (unknown) properties?
Planting Trapdoors • Can use these techniques to generate cryptographic elements with good public properties using an honest cost function • honestCost(x) • But also can try to hide useful (but privately known) properties using a malicious cost function • trapCost(x) • Now take combination and do both at the same time • Cost(x)=(1-l) honestCost(x) + ltrapCost(x) • Want l as high as you can get away with for the next N years! The result must still possess the required good properties. • l is the ‘malice factor’
3 l 256 - S f(x)g(x) x Planting Trapdoors • Carried out some experiments to generate highly non-linear Boolean functions. • wanted a technique that allowed new ‘trapdoor functionality’ to be inserted. • didn’t know what new trapdoor functionality would look like (it would be specific to the rest of the cipher in which the function were used) • clearly needs to be something other than closeness to a linear function (since this is diametrically opposite to public property) • Good way to test the technique was to generate a random Boolean function and require the eventual solution to be ‘close’ to it. g is trapdoor function Cost(f)=(1-l) S ||F(w)|-(2 8/2-4)| 3 + l is malice factor
Planting Trapdoors Non-linearity Autocorrelation MeanTrap=averagedot productof derived function withparticular trapdoor function 30 runs at each malice factor level l
Planting Trapdoors Publicly good solutions, e.g. Boolean functions with same very high non-linearity Publicly good solutions with high trapdoor bias found by annealing and combined honest and trapdoor cost functions. Publicly good solutions found by annealing and honest cost function There appears nothing to distinguish the sets of solutions obtained – unless you know what form the trapdoor takes! Or is there…
+1 -1 +1 +1 -1 +1 -1 -1 Vector Representations Different cost functions may give similar goodness results but may do so in radically different ways. Results using honest and dishonest cost functions cluster in different parts of the design space Basically distinguish using discriminant analysis. If you don’t have an alternative hypothesis then you can generate a family of honest results and ask how probable the offered one is.
Games People Play • It seems possible to tell that something has been going on. • And we don’t need to know precisely what has been going on. • Since any design has a binary vector representation the technique is general. • Currently have only looked at simple properties of vector projections. More complex tests easily possible. • Myriad of further games you can play… • if you know the form of discriminant tests used you can build that knowledge into your dishonest cost function • develop an artefact with some dishonest bias but which passes the envisaged tests
More Games People Play • Some honest cost function families may give different characteristics for malicious as well as normal use • QUT use of non-linearity and my recent cost function • Both plausibly (obviously) honest • Could people with large amount of computing power use it to get an honest cost function that facilitates a particular malicious cost function? • I used power factor R=3.0? Why not 2.95????
More Games People Play • Said you can try to build non-detection into your cost functions. • This assumes that you know the discriminant tests used • But verifier has arbitrary choice, e.g projection onto random discriminant vectors • Passing a discriminant test is a random variable. • Malicious designer cannot protect arbitrary choices. • Note we are looking to detect malicious insertion. Cannot protect against accidental possession of a malicious property.
More Games People Play • If you have a better optimisation technique than anyone else… • keep quiet about it • can trade off additional capability for increased malice factor l.
Last Slide on This (Honest) • An optimisation based design process may be open and reproducible. • Optimisation can be used and abused. • Optimisation produces results with some regularity of structure. • Designs developed against different criteria just look different. • The games do not stop.
(Same and) Different Cost Functions Give Different Results II Serious Cryptanalysis with (Poor) Cost Functions
Cryptanalysis: Pointcheval’s Scheme • Zero knowledge protocol based on NP-hard problem A and the histogram are public. If you can recover secret s then the system is broken Some suggested values for (m,n) are (101,117) (131,147)
Pointcheval’s Scheme • Need cost function to indicate how good an x-candidate vector y is. Examples of factors we might like to consider: • Non-negativity of Ay elements and histogram agreement • Could give negativity punishment of costNeg(y)=|-3|+|-1| =4 • Could give histogram punishment of costHist(y)=|3-2|+|1-0|| =2 • Now take weighted sum of these costs cost(y)=w1costNeg(y)+w2costHist(y)
Outline of Annealing 1 • Improving moves always accepted • Non-improvingmoves may be accepted probabilistically and in a manner depending on the temperature parameter Temp. Loosely • the worse the move the lesslikely it is to be accepted • a worsening move is less likely to be accepted the cooler the temperature • The temperature T starts high and is gradually cooled as the search progresses. • Initially virtually anything is accepted, at the end only improving moves are allowed (and the search effectively reduces to hill-climbing)
Outline of Annealing 2 Current solution is x Generate neighbouring solution y Cost difference D=f(y)-f(x) If D <0 then accept move (current=y) else accept if exp-D/T>U(0,1) else reject Uniform (0,1) randomvariable
Try 10000 moves Try 10000 moves Try 10000 moves Outline of Annealing 3 T=100 Try 10000 moves T=80 T=64 T=... T0 Search finishes when no progress has been made for some number QT of temperature cycles orsome maximum number of cycles has been executed
Simulated Annealing • A local search technique. Current candidate x. At each temperature consider 1000 moves Always accept improving moves Temperature cycle Accept worsening moves probabilistically. Gets harder to do this the worse the move. Gets harder as Temp decreases.
Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 All runs agree Profiling Annealing (Analysis of Repeated Runs) • Simulated annealing can make progress with this scheme, typically getting solutions with around 80% of the vector entries correct (but don’t know which 80%!!!) • Some efforts have been made to look at repeated runs of the annealing process and looking for commonality of elements in results. Hopefully where the solutions agree they are correct (Knudsen and Meier) Actual Secret
All agree (rightly) All agree(wrongly) All runs agree Profiling Annealing (Analysis of Repeated Runs) • The runs may agree correctly. • The runs may agree incorrectly. Actual Secret
Profiling Annealing (Analysis of Repeated Runs) • Knudsen and Meier use repeated runs, fix the commonly agreed elements and get a new series of runs to obtain a new set of commonly agreed bits etc. • At the end some bits will be fixed wrongly but the problem of finding them is now within computational range. • For the smallest (101,117) problem.
Viewpoint Analysis • But look again at the cost function templates • It’s as before but with two honest components. • Different weights w1 and w2 will given different results yet the resulting cost functions seem plausibly well-motivated. • We can view different choices of weights as different viewpointson the problem.
Radical Viewpoint Analysis • Take different viewpoints on the same problem, i.e. different cost functions • cost1(y)=5 costNeg(y)+1 costHist(y) • cost2(y)=3 costNeg(y)+3 costHist(y) • cost3(y)=1 costNeg(y)+5 costHist(y) • The cost surface is now different in each case but we still have • cost=0 => problem solved. • Now use these to converge on candidate solutions • For suitable chosen functions results typically have between 75-92% correct values. • Now consider those values on which they agree. By taking a large number of different cost functions you can reduce the number of values on which they agree wrongly almost to 0 (e.g. 30 cost functions get about 25% of key right with almost no bits wrong) • Additional cost functions remove incorrect agreement (but may also reduce correct agreement).
Random Viewpoint Analysis • But what’s the rationale behind the choices for the weights? • cost1(y)=5 costNeg(y)+1 costHist(y) • cost2(y)=3 costNeg(y)+3 costHist(y) • cost3(y)=1 costNeg(y)+5 costHist(y) • They were chosen by me because the various sets looked different. • Actually since the actual cost functions don’t need to be good in themselves then they may as well be random… • cost1(y)=w11 costNeg(y)+ w12 costHist(y) • cost2(y)= w21 costNeg(y)+ w22 costHist(y) • cost3(y)= w31 costNeg(y)+ w32 costHist(y) • ……
Thermo-statistical Annealing The Power of Watching Paint Dry
Thermo-statistical Annealing • Suppose now you have a binary 0-1 (+1,-1) problem of say 100 bits. ( • Assume move strategy is simply a bit flip. • In a temperature cycle with 10000 moves each bit will be given on average 100 opportunities to change value. • Some strange things happen if you watch the values taken by the variables within a temperature cycle. • As the process cools some variables seem increasingly keen to take on particular values (either 0 or 1). • E.g. the first bit variable may spend 95% of the cycle taking the value 1. Thus, it seems reluctant to take the value 0 and when it does so seems very ready to swap back to 1. • For various binary problems it is found that if a variable exhibits this behaviour it will generally take the preferred value at the end of the search. • Accept the inevitable and fix the variable at the preferred value when 95% threshold is achieved. Now spend rest of time changing other non-fixed variables.
Thermo-statistical Annealing • Intended primarily as a way of achieving annealing more efficiently. • can reduce the number of moves within a temperature cycle as variables are fixed. • I find it better to simply use the extra time on the remaining variables (i.e. get closer to thermal equilibrium)
But Why? • Why does this work? • Why should a variable exhibit clear tendency to a preferred value? • Obvious answer is because it is very difficult for it not to do so. There is something about the problem instance that drives it in this direction. • Could it be that this is because it is the correct value?
Thermo-statistical Trajectories • Yes. • The search process wants to take those values because THEY ARE THE CORRECT ONES. • With certain cost functions and problems the FIRST 50% OF VARIABLE VALUES FIXED IN THIS WAY ARE CORRECT. • Thus, within a few minutes you have half the key. Not always this successful but most cost functions and problems I have used give 25%+ initial correctness. • Can use about 8 different cost functions and typically one of those 8 will have 40%+ initially fixed bits correct (but you do not know which one).
Evolving Protocols • Recent IEEE S&P Oakland paper using genetic algorithms to evolve abstract protocols (with proofs!). • Fitness function is based on number of stated goals met at each message. • Random bits strings can be decoded as protocols expressed in BAN-logic formalism and executed. • When a receiver gets a message he uses BAN inference rules to update his belief state according to what he knows already and what is in the message. • this is a form of abstract execution.
Quantum Search • Problem: You are asked to maximise f(x)=x over 0..1000000. • Do you • use hill climbing • use quantum search • say, “you are obviously an academic, the answer is obvious” • Quantum search is an awfully inefficient way to get to the answer because it does not exploit structure. It essentially is a form of spruced up brute force search. • Find a solution x such that predicate(x) holds • Calculates in parallel predicate(x) for all x.
Quantum Search • If there are many solutions that satisfy the predicate then quantum search will find one much quicker and will select randomly between them. • Virtually all QS uses are blunt brute force. • But why not use structure in a problem. Get QS to produce solutions efficiently that are good in some way and then hill-climb.
More Optimisation Work • Do Genetic Cryptanalysis Programming • evolve programs to leak information (not just static approximations) • Do Genetic Quantum Programming • Unitary transformations are programming language statements for quantum computing. • Represented by unitary matrices. Can evolve strings of matrices to represent a computation and simulate for small machines. Use this as a means of learning new quantum algorithms. • Possibilities to evolve new quantum cryptanalytic approaches • Try to plant keyed trapdoors in more complex artefacts. • Statistical profiling of traditional optimisation techniques – potentially a very rich seam to mine (both in analysis and design).
A Tease • I do not think it will be long before optimisation techniques start making inroads into some surprising ways into major cryptanalysis.