260 likes | 352 Views
Memories of Bug Fixes. Sunghun Kim, Kai Pan, and E. James Whitehead Jr., University of California, Santa Cruz Presented By Gleneesha Johnson CMSC 838P, Fall 2006. Source Code Repositories. Collection of code changes Most long developed projects have one
E N D
Memories of Bug Fixes Sunghun Kim, Kai Pan, and E. James Whitehead Jr., University of California, Santa Cruz Presented By Gleneesha Johnson CMSC 838P, Fall 2006
Source Code Repositories • Collection of code changes • Most long developed projects have one • Typically used to store histories and make backups • Knowledge contained within hasn’t been fully leveraged • Previous development experience • Changes that fix bugs are particularly interesting
Horizontal Bug Finding Techniques • Applicable across all projects • Based on pre-defined bug patterns, theorem proving, and model checking • ESC/Java, FindBugs, JLint, PMD
Vertical Bug Finding Techniques • Aim for project-specific bugs • Different project requirements, business logic, and semantics • BugMem • Learn bug patterns over time
Bug Fix Memories • Project-specific bug and fix knowledge base • Developed by analyzing history of bug fixes • Can be used to detect bugs and suggests fixes
Building Bug Fix Memories • Extract source code, change logs, and deltas • Kenyon • File change • Contains list of region pairs that show differences between two file versions. • Regions called hunks • Consist of source code lines • Identify changes where a bug was fixed • Keyword search • Search for references to bug reports
Hunks • Hunk pair (HP) → deleted hunk (DH) (bug hunk (BH)) & added hunk (AH) (fix hunk (FH))
Extracting Components From Hunks • Extracts syntax patterns, components, from hunks • Parsing – extracts syntax components • Normalizing – generalizes syntax to match similar code • Filtering – eliminates noise • Information filtering • Diff filtering
Raw Component Extraction (Parsing) • Preprocessing • Basic syntax lines • Differentiate between composite statements (if, for, while) and simple statements (method call, assignment, etc.) • all multi-line simple statements are on a single line. • conditional predicates of if, for, while, etc. all lie on a single line. • Raw components • Based on abstract syntax tree of basic syntax line • 4 kinds: static Java call, Java call, user-defined call, and non-call
Normalization • If the variable type of a raw component is known, normalize variable • i.e., foo.flag → Foo.flag • Generate two components for raw components that contain numeric, boolean, char or string literals • One with and one without normalization for the literal • i.e., i = 1 → int = 1 & int = int
Normalization Continued • For a method call, actual parameters are normalized to the type of the parameter • Normalization level • Indicates a component’s degree of normalization amongst others extracted from same raw component • Level increases with degree of normalization
Information Filtering • Information value – indicates how much unique information a component carries • Determined by summing information value of its elements • Information value threshold – used to filter components with little unique information • Two is used as a threshold in the paper
Diff Filtering • Removes code unchanged between bug and fix hunks
Searching Memories • Bugs found by searching for matching patterns in bug hunks • Suggestions made by returning code in corresponding fix hunks • Several options for component searching • Adjust degree of matching and omission of very common components
Evaluation Continued • Determine half and full hit rates • Half hit – bug previously seen • Full hit – bug and fix previously seen • Five open source projects
Limitations • Missing Memories • Not applicable in initial stage of a project • Doesn’t catch cross-file relationships
Uses • Bug Finding Tool • BugMem • Can be integrated into Eclipse • Code example repository • Useful to developers new to a project