280 likes | 377 Views
Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs. Marjan Aslani, Nga Chung, Jason Doherty, Nichole Stockman, and William Quach Summer Undergraduate Program in Engineering Research at Berkeley (SUPERB) 2008 Team for Research in Ubiquitous Secure Technology. Overview.
E N D
Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs Marjan Aslani, Nga Chung, Jason Doherty, Nichole Stockman, and William Quach Summer Undergraduate Program in Engineering Research at Berkeley (SUPERB) 2008 Team for Research in Ubiquitous Secure Technology
Overview • Introduction to Fuzz testing • Our research • Result "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
What Is Fuzzing? • A method of finding software holes by feeding purposely invalid data as input to a program. – B. Miller et al.; inspired by line noise • Apps: image processors, media players, OS • Fuzz testing is generally automated • Finds many problems related to reliability; many of which are potential security holes. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Types of Fuzz Testing • BlackBox: Randomly generated data is fed to a program as input to see if it crashes. • Does not require knowledge of the program source code/ deep code inspection. • A quick way of finding defects without knowing details of the application. • WhiteBox:Creates test cases considering the target program's logical constraints and data structure. • Requires knowledge of the system and how it uses the data. • Deeper penetration into the program. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Zzuf - Blackbox Fuzzer • Finds bugs in applications by corrupting random bits in user-contributed data. • To make new test cases, Zzuf uses a range of seeds and fuzzing ratios (corruption ratio). "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Catchconv - WhiteBox Fuzzer • To create test cases, CC starts with a valid input, observes the program execution on this input, collects the path condition followed by the program on that sample, and attempts to infer related path conditions that lead to an error, then uses this as the starting point for bug-finding. • CC has has some downtime when it only traces a program and is not generating new fuzzed files. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Valgrind • A tool for detecting memory management errors. • Reports the line number in the code where the program error occurred. • Helped us find and report more errors than we would if we focused solely on segmentation faults. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Types of errors reported by Valgrind By tracking a program’s execution of a file, Valgrind determines the types of errors that occur which may include: • Invalid writes • Double free - Result 256 • Invalid reads • Double free • Uninitialized values • Syscal Pram • Memory leak "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Program run under Valgrind "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Methodology "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
All of the test files that triggered bugs were uploaded on Metafuzz.com. The webpage contained: Link to the test file Bug type Program that the bug was found in Stack hash number where the bug was located Metafuzz "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Metafuzz webpage "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Target applications • Mplayer, Antiword, ImageMagick Convert and Adobe Flash Player • MPlayer the promary target: • OS software • Preinstalled on many Linux distributions • Updates available via subversion • Convenient to file a bug report • Developer would get back to us! • Adobe bug reporting protocol requires a certain bug to receive a number of votes form users before it will be looked at by Flash developers. • VLC requires building subversions from nightly shots. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
In 6 weeks, generated more than 1.2 million test cases. We used UC Berkeley PSI-cluster of computers, which consists of 81 machines (270 processors). Zzuf, MPlayer, and CC were installed on them. Created a de-duplication script to find the unique bugs. Reported 89 unique bugs; developers have already eliminated 15 of them. Research Highlights "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
To provide assessments for the two fuzzers, we gathered several metrics: Number of test cases generated Number of unique test cases generated Total bugs and total unique bugs found by each fuzzer. Result "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Result con’t • Generated 1.2 million test cases • 962,402 by Zzuf. • 279,953 by Catchconv. • From the test cases: • Zzuf found 1,066,000 errors. • Catchconv reported 304,936. • Unique (nonduplicate) errors found: • 456 by Zzuf • 157 by Cachconv "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Result con’t • Zzuf reports a disproportionately larger amount of errors than CC. Is Zzuf better than CC? • No! The two fuzzers generated different numbers of test cases. • How could we have a fair comparison of the fuzzers’ efficiency? • Need to gauge the amount of duplicate work performed by each fuzzer. • Find how many of these test cases were unique. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Average Unique Errors per 100 Unique Test Cases • First, we compared performance of the fuzzers by the average number of unique bugs found per 100 test cases. • Zzuf: 2.69 • CC : 2.63 • Zzuf’s apparent superiority diminishes. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Unique Errors as % of Total Errors • Next, we analyzed fuzzers’ performance basedon the percentage of unique errors found out of the total errors. • Zzuf: .05% • CC: .22% • Less than a quarter percent difference between the fuzzers. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Types of Errors (as % of Total Errors) • Also considered analyzing the fuzzer based on bug types found by the fuzzers. • Zzuf performed better in finding “invalid write”, which is a more important security bug type. • Not an accurate comparison, since we couldn’t tell which bug specifically caused a crash. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Conclusion • We were not able to make a solid conclusion about the superiority of either fuzzer based on the metric we gathered. • Knowing which fuzzer is able to find serious errors more quickly would allow us to make a more informed conclusion about their comparative efficiencies. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Conclusion con’t • Need to record the amount of CPU clock cycles required to execute test cases and find errors. • Unfortunately we did not record this data during our research, we are unable to make such a comparison between the fuzzers. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Guides for Future Research To perform a precise comparison of Zzuf and CC: • The difference between the number of test cases generated by Zzuf and CC for a given seed file and specific time frame should be recorded. • Measure CPU time to compare the number of unique test cases generated by each fuzzer for a given time. • Need a new method to identify unique errors avoid reporting duplicate bugs: • Need to use automatically generate a unique hash for each reported error that can then be used to identify duplicate errors. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Guides for Future Research con’t 4. Use a more robust data collection infrastructure that could accommodate the massive amount of data colected. • Our ISP shut Metafuzz down due to excess server load. • Berkeley storage full. 5.Include an internal issue tracker that keeps track of whether or not a bug has been reported, to avoid reporting duplicate bugs. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
WhiteBox or BlackBox?? • With lower budget/ less time: use Blackbox • Once low-hanging bugs are gone, fuzzing must become smarter: use whitebox • In practice, use both. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Acknowledgment • National Science Foundation (NSF) for funding this project through the SUPERB-TRUST (Summer Undergraduate Program in Engineering Research at Berkeley - Team for Research in Ubiquitous Secure Technology) program • Kristen Gates (Executive Director for Education for the TRUST Program) • Faculty advisor David Wagner • Graduate mentors Li-Wen Hsu, David Molner, Edwardo Segura, Alex Fabrikant, and Alvaro Cardenas. "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani
Questions? Thank you "Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani Thank you! Thank you! Questions? Questions?