600 likes | 704 Views
Malevolution : The Evolution of Evasive Malware . Giovanni Vigna Department of Computer Science University of California Santa Barbara http:// www.cs.ucsb.edu /~ vigna Lastline, Inc. http:// www.lastline.com. Well, I had it all planned out…. Until this guy came out with his story!.
E N D
Malevolution: The Evolution of Evasive Malware Giovanni Vigna Department of Computer Science University of California Santa Barbara http://www.cs.ucsb.edu/~vigna Lastline, Inc. http://www.lastline.com
Well, I had it all planned out…. Until this guy came out with his story! Malware can take many forms…
Who Is He? • One of the top security researchers in Europe • Hire him! • Came to Berlin’s airport • Guy told him he was in the right taxi line • ‘Hey you don’t have a display with the money’ • Do not worry: The German government is creating a taxi-tracking program based on GPS so that no taxi driver needs a billing device: awesome!!! • Nick: GPS?!? Tracking!?! No money!?! Awesome!!!! • Scam cost Nick 200 Eur (normal charge would be 30)
Cyberattack (R)Evolution $$ Damage Targeted Attacks and Cyberwarfare Billions Millions Cybercrime Hundreds of Thousands !!! Cybervandalism $$$ Thousands #@! Hundreds Time
Nobody Is Safe… Targeted attacks are mainstream news. Every week, new breaches are reported. In the last few months alone …
Drive-by-download Attack www.semilegit.com www.bank.com www.badware.com <iframesrc=“http://semilegit.com” height=“0” width=“0”></iframe> www.grayhat.com ID/Password • POST /update?id=5’,’<iframe>..’)-- www.evilbastard.com Personal Data, Docs
Arms Race(s) Obfuscated Polymorphic MaliciousBinary Obfuscated Polymorphic MaliciousJavaScript Evasive MaliciousJavaScript Evasive MaliciousBinary MaliciousJavaScript MaliciousBinary Signature-basedAnti-virus Signature-basedWeb Gateways Behavior-based Anti-malware Behavior-based Anti-malware honeyclient sandbox
An Evasion Framework Labels/Blocks Executes/Displays KnownMaliciousArtifacts, Provenance Analysis System Artifact,Provenance Activates TargetSystem KnownBenignArtifacts,Provenance Consumer Producer
An Evasion Framework (*) First downloader
PBKAC: Make the user smarter • Evasion of the user good judgment • (SPAM: please don’t go!) • PHISHING: educate about provenance • MALWARE INSTALLS: educate about Fake AV, codecs • The “Can I haz kittens?” problem • MALICIOUS DOC: don’t open (good luck with that) • Anything with “budget”, “salary”, etc. WILL BE OPENED
Harden The Target • Evasion of the mechanisms to limit/control execution • Windows 2023 Ultimate Edition will be able to identify things that just should not be executed • MS Office Professional 56.2 will actually prevent documents from executing arbitrary code • Internet Explorer 23 will detect memory corruption attacks
Analysis Systems • Evasion of detection/labeling • Determine if an artifact is malicious based on previous history • Leverage both static and dynamic analysis • Additional information can be leveraged if other components need to be evaded as well
Evading Static Analysis • Static analysis techniques can be evaded by making the (relevant) code unavailable • Packing • Delayed inclusion of code • Static analysis techniques can be evaded by exploiting differences in the parsing capabilities of the target system vs. analysis system • Parsing the executable (target is OS) • Parsing the document (target is office application)
Evading Static Analysis Source: Binary-Code Obfuscations in Prevalent Packer Tools, Tech Report, University of Wisconsin, 2012
Evading Dynamic Analysis • Dynamic analysis techniques can be evaded by fingerprinting the environment (and not execute) • Detection of modified environment (instrumented libs) • Detection of specific HW/SW configurations • Devices • Users • File names
Evading Dynamic Analysis • Dynamic analysis techniques can be evaded by exploiting differences in the execution capabilities of the target system vs. analysis system • Semantics (virtualization/emulation introduces differences) • Speed (dynamic systems are usually slower) • Available resources (analysis has a finite, limited time) • Sleeping • Stalling loops • User activity monitoring
Evading Dynamic Analysis • Dynamic evasion – stalling loops
Combating Evasion • Static analysis • Use availability and parsing failures as a signal for detection • Benign software is packed • Benign software is obfuscated • Artifacts are often generated in a benign, wrong way • Modify the sample to make it harmless • Normalize • Remove functionality that cannot be analyzed • Might break functionality
Combating Evasion • Dynamic analysis • Reduce differences between analysis and target environment • Run on bare metal • Exploit hardware-supported virtualization • Use out-of-the-VM instrumentation • Detect environment checks • Identify conditional execution based on triggers • Return non-static information about the environment • Modify the sample to make it run • Multipath execution
Combating Evasion • Exploit the characteristics of multiple evasions • Phishing pages need to evade detection from the analysis system AND by the user • If the page does not look like the impersonated organization the attack will fail • Malicious documents need to evade detection from the analysis system, the target platform, AND the user • If the attachment does not look interesting it will not be activated
FeatureExtractor Terms Extractor Exploit Site EvilSeed Malicious MaliciousPages The Internet http://www.easymoney.com http://cheapfarma.ru http://rateyourcar.com C&C Site http://nudecelebrities.it Crawler Anubis Wepawet Prophiler Public Portal Honeyclient Honeyclient Honeyclient Cloud PossiblyMaliciousPages BenignPages BenignPages Threat Intel Block MaliciousPages
A Few Stats • ANUBIS • Number of unique IPs that submitted to Anubis: 433,290 • Number of files analyzed by Anubis: 59,199,463 (unique files: 45,730,419) • Registeredusers: 25,404 • WEPAWET • Numberof unique IPsthatsubmitted to Wepawet: 141,463 • Numberof pages visited and analyzed by Wepawet: 67,424,459 • Numberof malicious pages identified as malicious: 2,239,335
An Example: Detecting Split Personalities • Detect when a malware sample exhibits multiple personalities • Signaturebased techniques are impractical • Behavioral based techniques seem more promising... • Different behaviors are reliable indicators for split personalities
The Idea • Definition:Two systems are executionequivalent if all programs start with the same initial state, and receive exactly the same inputs • “Initial state” means same OS components, memory and registers are initialized with the same values • “Same inputs” means the access to disk, network, registry, time, and IPC returns the same value • Hypothesis:When a program is executed in two executionequivalent systems, it should exhibit the same behavior • “Same behavior” is output and sequence of system calls
Split Personalities • A program that has different behavior on two execution-equivalent systems implies that: • Some instruction yielded some observable effects • The program used (intentionally or not) these effect to follow a different execution path • This is likely the consequence of an attack based on CPU semantics or timing • The hard part is providing exactly the same inputs… • Efficient Detection of Split Personalities in Malware • DavideBalzarotti, Marco Cova, ChristophKarlberger, Christopher Kruegel, EnginKirda, Giovanni Vign in Proceedings of the Network and Distributed System Security Symposium (NDSS),San Diego, CA, February 2010.
The Approach: Log and Replay Reference System Analysis System Windows Windows Log Driver Replay Driver syscalllog (malware) sample (malware) sample Splitpersonlaity
Some Caveats • Not everything can be replayed • Some operations have results that must be consistent with the internal state of the operating system • Memory allocation • Some operations use handles the were created by pass-through system calls • The definition of “same behavior” needs to be relaxed to tolerate small, temporary deviations
An Example:Wepawet and Revolver • State-of-the-art in honeyclients • High-interaction honeyclients visit web pages and record modifications to the underlying system (file system, registry, processes) • Unexpected changes are attributed to attacks • Limitations • Defenders need to know in advance the components that will be targeted by attacks • Configuration can be complex and incomplete • Some of the vulnerable components are incompatible with each other • Limited explanatory power
Wepawet • Characterizes the behavior of the browser as it visits web pages • Monitors events that occur during visit • Characterizes properties of these events with features • Uses statistical models to determine if feature values are normal or anomalous • In the training phase, learns the characteristics of benign pages • In the detection phase, flags as suspicious pages that result in anomalous behavior • Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript CodeMarco Cova, Christopher Kruegel, Giovanni Vigna in Proceedings of the World Wide Web Conference (WWW),Raleigh, NC, April 2010
Wepawet Features • Exploit preparation • Number of bytes allocated (heap spraying) • Number of likely shellcode strings • Exploit attempt • Number of instantiated plugins and ActiveX controls • Values of attributes and parameters in method calls • Sequences of method calls • Redirections and cloaking • Number and target of redirections • Browser personality- and history-based differences • Obfuscation • String definitions/uses • Number of dynamic code executions • Length of dynamically-executed code
Wepawet Extensions • PDF analyzer • Analyzes the JavaScript within PDF documents • Flash component analyzer • Uses execution tracing to identify both malicious behavior and other network endpoints • Java Applet analyzer • Uses execution tracing to identify known exploits • Shellcode analyzer • Uses emulation to extract URLs pointing to additional malware
0-day Detection • “Aurora” attack • 0-day exploit against IE6 • Use-after-free vulnerability • Successfully compromised Google and other companies • Posted to Wepawet before having been made public • Soon after incorporated into Metasploit
Practical Impact • Routinely used for take-down requests and further analysis • Used to generate blacklist of malicious sites
Revolver: Detecting Evasions in Web-based Malware • Providing an oracle available to the public has drawbacks • Malware can be tested before deployment • Exploitation of discrepancies leads to failed detection • Revolver: An Automated Approach to the Detection of Evasive Web-based MalwareA. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, G. Vigna in Proceedings of the USENIX Security Symposium Washington, D.C. August 2013
Evasion: Scope Handling function foo() { ... //W6Kh6V5E4 is filled with non-alphanumeric data Bm2v5BSJE=""; W6Kh6V5E4 = W6Kh6V5E4.replace(/\W/g,Bm2v5BSJE); ... // W6Kh6V5E4 now contains valid JavaScript } function foo(){ ... varenryA = mxNEN+F7B07; F7B07 = eval; {} enryA= F7B07('enryA.rep' + 'lace(/\\W/g,CxFHg)'); ... }
Evasion: Interpreter Idioms OlhG='evil_code' wTGB4=eval wTGB4(OlhG) OlhG='evil_code' wTGB4="this"["eval"] // Only works in Adobe’s JS wTGB4(OlhG)
Evasion: Exception Paths function deobfuscate(){ ... // Define variable xorkey // and compute its value for(...) { ... // XOR decryption with xorkey } eval(deobfuscated_string); } try { eval('deobfuscate();') } catch (e){ alert('err'); } functiondeobfuscate(){ try { ... // is variable xorkeydefined? } catch(e){ xorkey=0; } ... // Computevalue of xorkey VhplKO8 += 1; // throwsexceptionfirsttime for(...) { ... // XOR decryption with xorkey} eval(deobfuscated_string); } try { eval('deobfuscate();') } // 1st call catch (e){ // Variable VhplKO8 is notdefined try { VhplKO8 = 0; // define variable eval('deobfuscate();'); // 2nd call } catch (e){ alert('err'); } }
Evasion: Liberal Configuration varnop="%uyt9yt2yt9yt2"; varnop=(nop.replace(/yt/g,"")); var sc0="%ud5db%uc9c9%u87cd..."; var sc1="%"+"yutianu"+"ByutianD"+ ...; varsc1=(sc1.replace(/yutian/g,"")); var sc2="%"+"u"+"54"+"FF"+ "%u"+"BE"+...+"A"+"8"+"E"+"E"; varsc2=(sc2.replace(/yutian/g,"")); varsc=unescape(nop+sc0+sc1+sc2); try { new ActiveXObject("yutian"); } catch (e) { varnop="%uyt9yt2yt9yt2"; varnop=(nop.replace(/yt/g,"")); var sc0="%ud5db%uc9c9%u87cd..."; var sc1="%"+"yutianu"+"ByutianD"+ ...; varsc1=(sc1.replace(/yutian/g,"")); var sc2="%"+"u"+"54"+"FF"+ "%u"+"BE"+...+"A"+"8"+"E"+"E"; varsc2=(sc2.replace(/yutian/g,"")); varsc=unescape(nop+sc0+sc1+sc2); }
Detecting Evasion: Challenges • Code is obfuscated • Code is generated on-the-fly • Code might probe for arcane versions of a browser • Not all code changes are relevant
Revolver IF IF Pages ASTs Candidate pairs VAR VAR <= <= NUM NUM Web Oracle … … Malicious evolutionData-dependencyJavaScript infections Evasions Similarity computation {bi, mj} … …
Optimizations • The comparison step requires determining the edit distance between n benign scripts and m malicious scripts (which is usually infeasible) • We eliminate duplicate ASTs • We compute sequence summaries, which are vectors with the frequencies of the possible 88 operations • We extract the k nearest neighbors sequence summaries and we apply the similarity over the associated ASTs
Classification • Data-dependency: categorizes script differences that are associated with transforming data into code • Same packers usually produce different code: if generating code is same and generated code is very different, do not flag as evasion • Injection: categorizes script differences that are due to addition of code to a previously-benign script • Site gets compromised and attacker adds code to well-known JavaScript libraries (e.g., jQuery) • Evasion: categorizes script differences that are mostly composed of control-flow nodes added to the previously-malicious script • Control-flow decisions are made to avoid executing the malicious functionality
Evaluation: Evasion • Collected 6,468,623 pages, of which 265,692 malicious • Extracted 20,732,766 benign scripts, and 186,032 malicious scripts • Derived 705,472 unique ASTs and 55,701malicious ASTs • For each benign AST, found ~70 malicious neighbors • Computed 208K candidate pairs • 6,996 Injections (701 classes) • 101,039 Data dependencies (475 classes) • 4,147 Evasions (155 classes) • 2, 490 Evolutions (273 classes)