1 / 22

SATE 2010 Background

SATE 2010 Background. Vadim Okun, NIST vadim.okun@nist.gov October 1, 2010 The SAMATE Project http://samate.nist.gov/. Cautions on Using SATE Data. Our analysis procedure has limitations

victoria
Download Presentation

SATE 2010 Background

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SATE 2010 Background Vadim Okun, NIST vadim.okun@nist.gov October 1, 2010 The SAMATE Project http://samate.nist.gov/

  2. Cautions on Using SATE Data • Our analysis procedure has limitations • In practice, users write special rules, suppress false positives, and write code in certain ways to minimize tool warnings. • There are many other factors that we did not consider: user interface, integration, etc. • So do NOT use our analysis to rate/choose tools

  3. Overview Tools that work on source code Security? Quality? Insignificant? False? ? Program CVE entries Tool C Tool B X Tool A X Buf Leak Race … X X Human findings X

  4. SATE 2010 timeline • Choose test programs (2 C, 1 C++, 2 Java). Provide them to tool makers (28 June) • Teams run their tools, return reports (30 July) • Analyze the tool reports (22 Sept) • Report at the workshop (1 Oct) • Teams submit a research paper (Dec) • Publish data (between Feb and May 2011)

  5. Participating teams • Armorize CodeSecure • Concordia University Marfcat • Coverity Static Analysis for C/C++ • Cppcheck • Grammatech CodeSonar • LDRA Testbed • Red Lizard Software Goanna • Seoul National University Sparrow • SofCheck Inspector • Veracode • a service company

  6. Test cases • Dovecot: secure IMAP and POP3 server – C • Pebble: weblog server – Java • Wireshark - C • Google Chrome – C++ • Apache Tomcat - Java • All are open source programs • All have aspects relevant to security • From 30k LoC (Pebble) to 4.7M LoC (Chrome) CVE-based pairs (vulnerable and fixed)

  7. Tool reports • Teams converted reports to SATE format • Several teams also provided original reports • Described environment in which they ran tool • Some teams tuned their tools • Several teams provided analysis of their tool warnings Original tool formats XML HTML DB … SATE format

  8. Analysis procedure 3 Selection Methods Select randomly Tool warnings ~60K warnings Analyze for correctness and associate Related to human findings Selected warnings Related to CVEs Analyze the data

  9. Method 1 – Warning SelectionFor Dovecot and Pebble only • We assigned severity if a tool did not • Mostly avoid warnings with severity 5 (lowest) • Statistically select from each warning class • Select more warnings from higher severities • Select 30 warnings from each of 10 tool reports • 1 report had only 6 warnings • Did not analyze Marfcat warnings • Total is 276

  10. Method 2 – Human findingsFor Dovecot and Pebble only • Security experts analyze test cases • A small number of findings • Root cause, with an example trace • Find related warnings from tools • Goal: focus our analysis on weaknesses found most important by security experts

  11. Method 3 - CVEsFor Wireshark, Chrome, and Tomcat • Identify the CVEs • Locations in code • Find related warnings from tools • Goal: focus our analysis on real-life exploitable vulnerabilities

  12. Correctness categories • True security weakness • True quality weakness • True but insignificant weakness • Weakness status unknown • Not a weakness

  13. Differences from SATE 2009 • Add CVE-selected test cases • Include a C++ test case • Larger test cases: Chrome - 4.7 MLOC • More correctness categories (true quality) • More detailed guidelines for analysis • Still, much can be improved…

  14. Thanks • Romain Gaucher, Ram Sugasi • Aurelien Delaitre, Sue Wang, Paul Black, Charline Cleraux, and other SAMATE team members • Most of all, the participating teams!

  15. Questions • What weaknesses exist in real programs? • What do tools report for real programs? • Do tools find important weaknesses? • Focus on tools that work on source code • Defects that may affect security • Goal is NOT too choose the “best” tools • This is the 3rd SATE (1st in 2008)

  16. SATE goals • To enable empirical research based on large test sets • To encourage improvement and adoption of tools • NOT to choose the “best” tools

  17. SATE common tool output format optional <weakness id=“23”> <name cweid=“79”>SQL Injection</name> <location id=“1” path=“dir/f.c” line=“71”/> … <grade severity=“2” probability=“0.5”/> <output> Query is constructed with user supplied input … </output> … </weakness> one or more traces that it is true 1 to 5, with 1 – the highest …and other annotation

  18. Lessons learned • Guidelines for analysis often ambiguous – need to be refined even more • Our analysis has inconsistencies and lapses • Careful analysis takes longer than expected • We do not know the code well • Tool interface is important to understand a weakness

  19. Analysis procedure • We cannot know all weaknesses in the test cases • Impractical to analyze all tool warnings So analyze the following: • Method 1. A subset of warnings from each tool report • Method 2. Tool warnings related to manually identified weaknesses

  20. SATE tool output format • Common format in XML • For each weakness • One or more trace - locations - line number and pathname • Name of weakness and (optional) CWE id • Severity: 1 to 5 (ordinal scale), with 1 – the highest • Probability that the problem is true positive • Original message from the tool • And other annotation

  21. Our analysis • Correctness of warning • Associate warnings that refer to the same (or similar) weakness

More Related