Improving Security Through Software De bloating

Improving Security Through Software De bloating 1 March 2019 Michael D. Brown Research Scientist, GTRI

Agenda • The Software Bloat Problem • Software Debloating as a Solution • Effects of Software Debloating on Security • Code Reuse Attacks • Security Improvement Measures • How are these effects measured? • Are these the right measures?

Introduction Software security is a complex arms race: • New attack techniques spur research into new defense mechanisms • New defenses require attackers to construct more complex methods to evade them This isn’t going anywhere anytime soon; Software security is an undecidable problem. In this talk, we will explore a proactive method for improving security.

What is Software Bloat? Modern software engineering practices favor software and systems that are: • Modular • Reusable • Feature Rich This helps engineers rapidly develop complex and widely deployable software. However, it comes at a cost – when software is deployed and executed it contains large portions of code that will never be used. This is software bloat.

Sources of Software Bloat - Vertical Software bloat occurs vertically throughout the software stack, primarily at layers of abstraction. Example: glibc modularizes a large number of common C functions via an API. It is very unlikely that a program linking this library uses all of its functions, but the entire library is loaded at execution time.

Sources of Software Bloat - Lateral Software bloat occurs laterally within a software package due to feature creep. Example: cURL supports data transfer via 23 different protocols: DICT, FILE, FTP, FTPS, Gopher, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, Telnet and TFTP Common users of cURL or libcurl are unlikely to full utilize this level of feature variety.

Prevalence of Software Bloat • A recent study of vertical software bloat by Quach et al [1] found: • Across 12 commonly used programs such as Firefox, Chrome, VLC, and Sublime, only 65% of instructions and 32% of functions found in dependent libraries are actually required. • Commonly used features in these programs require a relatively small portion of the total instructions present in the programs and libraries. For example: • Playing an audio file in VLC requires only 12% of the overall instructions. • Creating, composing, and saving a file in Sublime requires only 27% of the overall instructions. • Fetching and displaying 10 popular websites in Firefox requires only 29% of the overall instructions.

Negative Effects of Software Bloat Performance: Increases code size / complexity Increases memory footprint Increases execution time (not always significant) Increases power consumption Security: Harder to analyze with bug finding tools Bloat code may contain reachable attack vectors / vulnerabilities Bloat code may increase the overhead of security defenses Bloat code can potentially be used in a code reuse attack

Background: Code Re-use Attacks • Code Reuse attacks are a class of exploits that circumvent Write XOR Execute defenses by reusing existing executable code in the target program. • Early attacks exploited memory corruption bugs (such as buffer overflow) to redirect the execution of the program to useful functions in libc. (return-into-libc) • These attacks were limited in expressivity – the attacker could not inject arbitrary code.

Gadget Based Code Reuse: Return Oriented Programming (ROP) • In 2007, Shacham [7] proposed the first gadget-based code reuse attack method, called Return-Oriented Programming (ROP). • ROP overcomes the expressivity limits of return-into-libc by chaining multiple snippets of code called “gadgets” together to construct a malicious payload. • Gadgets are used like complex instructions, and the the total set of gadgets available to the attacker is their ISA. • Gadgets end in a return instruction, which the attacker can exploit to maintain control flow.

Gadget Based Code Reuse: Jump Oriented Programming (JOP) • In response, researchers developed a large number of defenses against ROP attacks, aimed primarily at defending the stack. • In response, JOP [8] was introduced. JOP uses gadgets ending in indirect jump and call instructions to maintain control flow instead of relying on return instructions and the stack. • JOP requires the use of a special purpose gadget, called a dispatcher that handles control flow. • COP – call only derivative of JOP.

Software Bloat and Gadgets • Bloat code, whether it comes from unnecessary library functions or extraneous program features, is loaded into program memory at run time. • This bloat code is useless to the user executing the program, but contributes to the total set of gadgets available to the attacker. • The attacker can use these gadgets in the construction of their exploits.

Software Debloating In response to this problem, researchers are developing ways to debloat software. Software debloating is a type of software transformation, and can be generalized as follows: P’ = Debloat(P, context) Stated plainly, debloating takes as input a package P and the context in which P is intended to be deployed, and produces a variant P’ that contains just enough functionality to satisfy the specified context.

Software Debloating Software debloating can be performed at many different points in the software lifecycle: Original Source Code Debloat Code in Memory Debloat Binary via Rewriting Debloat Intermediate Representation (IR) Debloat Source Code Preprocessor Middle End Loader Compiler Front End Compiler Back End Linker Package Binary

Overview of Software Debloating Techniques - CHISEL CHISEL [2] is an automated debloating tool that targets unnecessary feature code. It takes as input a test script written by the user that expresses the desired functionality of a debloated variant. This test is used as part of a feedback directed program reduction method based on delta debugging principles.

Overview of Software Debloating Techniques - TRIMMER TRIMMER [3] is an automated debloating tool that targets unnecessary feature code. It takes as input a user defined static package configuration that expresses the compile time configuration of the package. Configuration data is treated as a compile time constant, and is propagated throughout the program. This is followed by custom, aggressive compiler optimizations to prune functionality from the program.

Overview of Software Debloating Techniques - PCL Piece-wise Compilation and Loading (PCL) [5] is an automated debloating technique that targets software bloat originating from shared libraries (described as load time dead code elimination). During compilation, the piecewise compiler generates an external dependency graph which is used at run time to eliminate code bloat in memory. Memory locations that contain bloat are marked non-executable, preventing them from being used in a code re-use attack.

Claims of Positive Effects of Software Debloating on Security • Reduces the number of code reuse gadgets • Less code means fewer pieces of code the attacker can stitch together in gadget based code reuse attacks • Moving target defense / software diversity • Debloating creates new variants of the original package • May render information about the original package useless • Vulnerability Elimination • If code is removed that contains a reachable vulnerability, the attacker cannot exploit it • May potentially remove zero day threats

Difficulties in Measuring Claims of Improved Security • Moving target defense / software diversity • Debloating intentionally leaves desired functionality untouched – so information attacker has about the original program can still be used unless it pertains to a debloated segment • Doesn’t stop the attacker from reversing the variant as well. • Difficult to articulate how much this would inhibit an attacker in a practical scenario. • Vulnerability Elimination • Who cares if debloating removes known vulnerabilities? • You can’t always debloat vulnerable pieces of a package, but you can always patch them. • How do we know if we actually removed a zero day or not?

Measuring the Impact of Debloating on Gadgets Measuring the impact of debloating on gadget reduction is simple. Static Analysis tools like ROPgadget [6] will scan a binary and count gadgets by type. We can subtract the number of gadgets found in the debloated package from the number of gadgets found in the original and calculate the rate of gadget count reduction. On average: CHISEL reduced total gadgets by 54.1% TRIMMER reduced total gadgets by 20% PCL reduced total gadgets by 71%

So, reducing gadget counts in programs is good and improves security, right? Well…. It’s complicated.

Shortcomings of Gadget Count Reduction In our research, we discovered that gadget count reduction is too superficial to be an accurate metric for measuring security improvement. We explored the effect debloating has on security using measures from the attacker’s perspective by asking the question: How does debloating make constructing a code reuse attack more challenging or difficult? In our experiments, it was fairly common for software debloating to achieve good gadget count reduction, yet fail to improve security. Worse yet, we observed instances where security was made worse through software debloating despite achieving positive gadget reduction.

Analysis Methodology • We debloated three software packages that varied in size, structure and operational complexity: • libmodbus v3.1.4 [27], a software library implementing a common industrial network protocol. • Bftpd v4.9 [26], an FTP server utility program. • libcurl v7.61.0 [9], a data transfer utility library. • We debloated each package at three different intensity levels: • Conservative: Some peripheral features in the package are targeted for debloating. • Moderate: Some peripheral features and some core features are targeted for debloating. • Aggressive: All debloatable features except for a small set of core features are targeted for debloating. • We used our own debloater that operates in manner comparable to CHISEL and TRIMMER (as these debloaters were not available) and achieves gadget count reduction on par with them. Our debloater reduced the count of gadgets on average by 30% for aggressive scenarios, 15% for moderate scenarios, and 8% for conservative scenarios.

Analysis Methodology We focused our analysis on three areas: • Overall composition of the gadgets in the original and variant package • Which gadgets actually were removed? • Are there side effects of removing gadgets? • Population of special purpose gadgets • Does debloating remove ”dealbreaker” gadgets necessary to construct exploits? • Expressivity of the gadgets in the package • How computationally powerful is the ”Instruction Set” in the original and variant package?

Results – Gadget Composition • Analyzing the composition of the gadgets within a debloated variant led to an interesting discovery: Debloating techniques that remove code from a program representation introduce new gadgets into the variant. • In some cases, debloating actually caused the total gadget count to increase for some types due to introduction!

Gadget Introduction Mechanisms – Compiler Optimization • Removing code from a software package via debloating can have unpredictable effects on optimization and code generation choices made by the compiler. • Some optimizations suppressed and/or triggered by debloating: • Loop Unrolling • Function Inlining • Dead Code Elimination

Gadget Introduction Mechanisms – Unintended Gadgets • x86 and x86-64 have variable length instructions, so it is possible to decode instructions from an offset other than the original instruction boundary. Gadgets found using this method are called unintended gadgets. • Since debloating causes significant changes to the package’s final representation, the unintended gadgets in a package can vary greatly in a debloated variant.

Results – Gadget Composition • We compared the sets of unique gadgets in our original package to each of its variants. • Interestingly – gadget introduction in our scenarios is not a rare event, and it happens at a rather high rate. • The prevalence of gadget introduction is not apparent using gadget count reduction. Gadget introduction can occur at high rates as a result of debloating, but be hidden under a reduction in total size. • It is impossible to know the attacker’s concrete aims in advance, which makes it difficult to know if the gadgets removed by debloating are more or less useful than those introduced.

Prevalence of Gadget Introduction Introduction Rate: Introduced Gadgets / Total Gadgets in Debloated Variant * 100

Gadget Introduction and Gadget Quality • Like gadget count reduction, the introduction rate of gadgets via debloating doesn’t give us a complete picture. However, it does open up that possibility that debloating can actually make security worse. • We need a qualitative analysis the gadget sets to determine if debloating was successful or not. • This is where things get tricky – like software diversity and vulnerability elimination, this is difficult to measure. • From an attacker’s perspective, the gadgets available must allow them to maintain exploit control flow (using special purpose gadgets) and provide sufficient computational power to express their desired exploit (using functional gadgets). • We explored the effects of debloating on these two aspects.

Individual Gadget Quality – Special Purpose Gadgets • We used a combination of automated static analysis and manual inspection to identify special purpose gadgets in the original package and their debloated variants. (NOTE: not an exhaustive list of all special purpose gadgets)

Individual Gadget Quality – Special Purpose Gadgets Key Observations: • Debloating generally reduced the number of special purpose gadgets available. • However, we observed no cases in which it eliminated all of the gadgets of a particular type. • In three cases, the gadget introduction increased the availability of gadgets pf a particular type. (It should be noted that only one gadget was introduced in each instance) • In one case, a special purpose gadget type that was not available was introduced.

Gadget Set Quality – Expressivity of Set • Gadget Expressivity is a measure of the computational power of a set of gadgets. • Recall that gadgets are used as complex instructions to ”write” a malicious payload using only snippets of the compromised software package. • These gadgets can be used to accomplish typical computational tasks such as arithmetic operations, memory load / store, conditional branching, etc. • The high bar for expressivity is Turing-completeness which means a set of gadgets is computationally universal. Stated plainly, it means that the gadgets can be used to construct any program. • However, practical ROP exploits have been demonstrated that do not require a Turing-Complete level of expressivity.

Gadget Set Quality – Expressivity of Set • Gadget based code reuse techniques such as ROP and JOP have been shown to be capable of achieving Turing-Completeness. • This is usually shown academically using handpicked gadgets from a common system library such as libc, however Homescu et al [9] proposed a method for classifying short gadgets into computational classes to determine if a given set of gadgets achieves an expressiveness level. • They show that if at least one gadget exists per class, it can be used for all computational needs relative to that class. If all classes are satisfied relative to a level of expressivity, then the set of gadgets has achieved that level of expressivity. • We used this gadget scanner to determine the expressivity of the gadgets in the original packages and their variants, which allows us to determine how debloating has affected expressivity.

Results • We measured expressivity across three levels – the minimum expressivity for a practical ROP exploit, the expressivity required for the same exploit in an ASLR environment, and Turing Completeness.

Results Key Observations: • In three of nine scenarios, debloating reduced the expressivity of the set of gadgets. • In three of nine scenarios, debloating resulted in some increases, and some decreases. • In two scenarios, debloating had no effect on expressivity. • In one scenario, debloating increased the expressivity.

High Level Summary Of the nine debloating scenarios we explored, only one scenario resulted in a clear improvement to security across all our analysis methods. One scenario resulted in a clear negative impact on security. In the other seven scenarios, the results were mixed or neutral.

Alternative to Gadget Count Reduction • In all of our scenarios, debloating resulted in a positive gadget count reduction. However, our deeper analysis showed a much grayer picture. • To fully understand how debloating a package has affected security, a deep and multi-faceted analysis is required. We propose the following measures be considered: • Gadget Count Reduction by Gadget Type • Gadget Introduction Rate by Gadget Type • Partial Expressivity of Functional Gadgets by Type • Contributions by External Dependencies • We do not claim this is an exhaustive list – future work may identify other measures that are useful for this purpose.

How Can We Best Incorporate These Measures • As a consequence of the use of gadget reduction as a metric, recently proposed debloating methods treat debloating for security the same way they treat debloating for performance. • More is better, so debloat as much as possible! • The effects of debloating on security are difficult to predict, and our research shows they are not monotonic. More isn’t necessarily better. • We propose an iterative, human-in-the-loop process for debloating that incorporates our proposed measures. • This model allows for corrections to the debloating specification based on the previous iteration’s results.

How Can We Best Incorporate These Measures

A case study – Max debloating does not mean max security improvement • We took our scenario which resulted in negative results and attempted a second iteration. • In our second iteration, we chose to alter to the specification to debloat three fewer features that were highly correlated with other retained features. • We repeated our analysis on the new variant, and the end result was the elimination of the negative impacts, as well achieving a marginal security improvement with respect to expressivity.

Summary of Key Takeaways • The relationship between software debloating and software security is complicated. • Debloating can fail to improve security, or even make it worse. • Debloating for security is not like debloating for performance – debloating more does not necessarily produce better results. • Measuring the reduction in gadget count is insufficient to make claims of improved security. • It can hide negative effects of debloating such as gadget introduction. • It is not directly related to more important measures such as availability of special purpose gadgets and gadget set expressivity. • Debloating to improve security is possible, but not nearly as easy as suggested by recent work. • It requires deep and multi-faceted analysis to determine the effect debloating had on security. • It may require multiple iterations to get it right.

Questions?

References • Quach, A., Erinfolami, R., Demicco, D., and Prakash, A. A multi-OS cross-layer study of bloating in user programs, kernel, and managed execution environments. In The 2017 Workshop on Forming an Ecosystem Around Software Transformation (FEAST) (2017). • Lee, W., Heo, K., Pashakhanloo, P., and Naik, M. Effective Program Debloating via Reinforcement Learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS) (2018). • Sharif, H., Abubakar, M., Gehani, A., and Zaffar, F. TRIMMER: Application specialization for code debloating. In Proceedings of the 2018 33rd ACM/IEEE International Conference on Automated Software Engineering (ASE) (2018). • Chen, Y., Sun, S., Lan, T., and Venkataramani, G. TOSS: Tailoring online server systems through binary feature customization. In The 2018 Workshop on Forming an Ecosystem Around Software Transformation (FEAST) (2018). • Quach, A., Prakash, A., and Yan, L. Debloating software through piece-wise compilation and loading. In Proceedings of the 27th USENIX Security Symposium (2018). • Salwan, J. ROPgadget: Gadgets finder and auto-roper, 2011. http://shell-storm.org/project /ROPgadget/ • Shacham, H. The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86). In Proceedings of 14th ACM conference on Computer and Communications Security (CCS) (2007). • Bletsch, T., Jiang, X., Freeh, V.W., and Liang, Z. Jump-oriented programming: a new class of code-reuse attack. In Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security (ASIACCS) (2011) • Homescu, A., Stewart, M., Larsen, P., Brunthaler, S., and Franz, M.Microgadgets: size does matter in turing-complete return-oriented programming. In Proceedings of the 6th USENIX conference on offensive technologies (WOOT) (2012).

Improving Security Through Software De bloating

Improving Security Through Software De bloating

Presentation Transcript

improving performance through strength

Improving Software Economics

Improving Automation Through Software Environments (3.4 Royce)

Improving Software Security with Dynamic Binary Instrumentation

Network Service Security through software defined networking

Reducing Software Security Risk Through an Integrated Approach

- Improving Lives Through Research -

Improving Comprehension Through Visualization

Improving Software Quality Through Communication

Improving Xen Security through Disaggregation

“Improving Results Through Collaboration”

Improving Software Security with Precise Static and Runtime Analysis

“Improving Nutrition and Food Security through Smallholder Agriculture”

Improving Spelling through

Improving Through Engineered Thermoplastics

A Quick Walk Through BSIMM7 Software Security Framework

Improving Security Guard Services Through Feedback Management

Improving Health Through Policy

(Software) Security

Software Security

Software Security Through Code Obfuscation