360 likes | 500 Views
MSc Software Maintenance MS Viðhald hugbúnaðar. Fyrirlestrar 15 & 16 Programmers Use Slices When Debugging. Case Study Dæmisaga. Reference Programmers Use Slices When Debugging , Mark Weiser, Communications of the ACM, Volume 25, Number 7, pp 446-452, 1982. The basic debugging method.
E N D
MSc Software MaintenanceMS Viðhald hugbúnaðar Fyrirlestrar 15 & 16Programmers Use Slices When Debugging Dr Andy Brooks
Case StudyDæmisaga • Reference • Programmers Use Slices When Debugging, Mark Weiser, • Communications of the ACM, Volume 25, Number 7, • pp 446-452, 1982. Dr Andy Brooks
The basic debugging method • Reading 1 million lines of code, from beginning to end, to locate and remove a bug is not efficient. • 100 LOC/day equates to 10000 days... • 1000 LOC/day equates to 1000 days... • The basic debugging method is to begin at the statement where the error appears and then reason backwards about the previous sequence of statements. Dr Andy Brooks
Reasoning backwards • Reasoning backwards to determine all the influences on a variable usually reveals that many statements in the program have no influence. Sometimes you reason backward to the hardware or translation software... Dr Andy Brooks
Að sneiða Program Slicing “The process of stripping a program of statements without influence on a given variable at a given statement is called program slicing.” “An elementary slicing criterion of a program P is a tuple <i,V> where i denotes a specific statement in P and V is a subset of variables in P.” Dr Andy Brooks
A program and a program slice • BEGIN • READ(X,Y) • TOTAL:=0.0 • SUM:=0.0 • IF X<=1 • THEN SUM:=Y • ESLE BEGIN • READ(Z) • TOTAL:=X*Y • END • WRITE(TOTAL,SUM) • END. Slice on Z at statement 12 BEGIN READ(X,Y) IF X<=1 THEN ELSE READ(Z) END. TOTAL, SUM and Y have no influence on Z. Dr Andy Brooks
A program and a program slice • BEGIN • READ(X,Y) • TOTAL:=0.0 • SUM:=0.0 • IF X<=1 • THEN SUM:=Y • ESLE BEGIN • READ(Z) • TOTAL:=X*Y • END • WRITE(TOTAL,SUM) • END. Slice on X at statement 9 BEGIN READ(X,Y) END. Dr Andy Brooks
A program and a program slice • BEGIN • READ(X,Y) • TOTAL:=0.0 • SUM:=0.0 • IF X<=1 • THEN SUM:=Y • ESLE BEGIN • READ(Z) • TOTAL:=X*Y • END • WRITE(TOTAL,SUM) • END. Slice on TOTAL at statement 12 BEGIN READ(X,Y) TOTAL:=0.0 IF X<=1 THEN ELSE TOTAL:=X*Y END. Dr Andy Brooks
tilgáta Experimental Hypothesis H1 “... debugging programmers, working backwards from the variable and statement of a bug´s appearance, use that variable and statement as a slicing criterion to construct mentally the corresponding program slice.” Experimental Hypothesis H2 “... programmers look at code only in contiguous pieces.” Dr Andy Brooks
“Slices are generally not contiguous pieces, but contain statements scattered throughout the code.” ---------- ---------- ---------- ---------- ---------- ---------- xxxxxx xxxxxx xxxxxx xxxxxx xxxxxx ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- xxxxxx ---------- ---------- ---------- ---------- xxxxxx ---------- xxxxxx ---------- ---------- ---------- ---------- ---------- ---------- xxxxxx ---------- ---------- ---------- xxxxxx contiguous aðlægur slice Dr Andy Brooks
Method • Programmers debug three programs. • Test programmers´ memory of various code fragments • particularly the program slice relevant to the bug. “If the programmers did slice, then their memories for the relevant slices should be at least as good as their memories of contiguous code, and somewhat better than their memories of other non-contiguous code.” Andy says, this is more like a properly stated hypothesis. Dr Andy Brooks
Andy says: no protocol analysis • It is important to recognise that programmers were not observed working with the programs. • Their actions and the program statements they considered were not recorded. • Testing programmers´ memory is an indirect measurement. • And you may not be measuring what you think you are measuring... Dr Andy Brooks
Materials • Three programs written in Algol-W • Program sizes from 75 to 150 lines of code • Program TALLY • An IBM scientific subroutine • poorly structured and non-mnemonic variable names • Program PAYROLL • written for the experiment • computes salaries and deductions • well structured and mnemonic variable names • Program EVADE • written for the experiment • simulation of random aircaft turns • well structured and mnemonic variable names Dr Andy Brooks
Program bugs The bugs were chosen so that the entire experiment could be completed in less than an hour. Dr Andy Brooks
5 types of program fragments shown to programmers: • Relevant slice • Relevant contiguous • overlapped the relevant slice • Irrelevant contiguous • did not overlap relevant contiguous • did not overlap relevant slice • program TALLY had no irrelevant contiguous • Irrelevant slice • Jumble • every 3rd or 4th statement Dr Andy Brooks
Fragment overlaprelevant slice & relevant contiguous Andy asks: What were the number of statements in the relevant slices? Overlap is the fraction of statements shared by two fragments. Dr Andy Brooks
Syntactic changes • Syntactic changes were made to the code fragments to prevent recognition by a particular detail: • Variables and constants in the fragments were renamed as single letters followed by a unique number. • Indenting was adjusted from the original program to a form internally consistent with each fragment. Dr Andy Brooks
þátttakendur Participants • Experienced Algol-W programmers • Graduate student teaching assistants • all from the University of Michigan in Ann Arbor • 26 volunteers • 4 participated in pilot studies • 1 did not follow instructions in the experiment • 21 final participants Dr Andy Brooks
Andy´s view • Pilot studies are conducted to: • To check experimental materials are in order. • Instructions are clear. • To check experimental processes are sound. • There is sufficient time to complete tasks. • Participants behave in the way expected. • Weiser reports that pilot studies were conducted but fails to report on actions taken as a result of the pilot studies. Any actions taken should be briefly reported. Dr Andy Brooks
Procedure • Participants were given all three programs to debug in random order. • Participants were then asked to rate 14 program fragments for how sure they were the fragment had been used in one of the three programs. • remember, program TALLY had no irrelevant contiguous fragment (3*5-1 = 14) • Code fragments were given in random order each on a separate page with its rating scale. • Participants were told not to look back either at the programs or at previously rated code fragments. Dr Andy Brooks
Part of the relevant slice for PAYROLL Dr Andy Brooks
Fragment shown to participants Rating scale recognition Dr Andy Brooks
Results • All 21 participants found the bugs in TALLY and EVADE but only 17 found the bug in PAYROLL. Table IV Debugging times (minutes) Andy asks: what were the minimum and maximum times? Dr Andy Brooks
Results • A two-way analysis of variance using Friedman´s test indicated an overall difference in the ratings of the different fragments. • fragment type, program type Andy says: it is important for an overall test to be significant before looking at individual differences. The test is named, but the alpha level is not reported here (0.05?, 0.01?). Dr Andy Brooks
Results Figure 3 by fragment type 54% 28% 24% Why is recognition so high? Dr Andy Brooks
Significant differencesWilcoxon matched-pairs signed-ranks test • The difference between relevant slices and irrelevant slices is significant at the 0.03 level. • The difference between relevant slices and jumbles is very significant at the 0.005 level. Dr Andy Brooks
Results • Irrelevant contiguous was recognised because the programs were small and the irrelevant contiguous fragments were close to the output statements which wrote the incorrect variable values. • Participants would likely have examined code around these output statements. Dr Andy Brooks
Results Figure 4by fragment type and program type Dr Andy Brooks
Results Figure 4 • TALLY shows the greatest recognition of the relevant slice fragment. • Because TALLY was poorly structured (many GOTOS), perhaps more programmers adopted a slicing strategy to debug it. Dr Andy Brooks
Results Table V • To conclude the experiment, participants were asked about the typicalness of the programs and the bugs. • Table V shows that the mean ratings were at least 2.4 on a 1 to 4 scale. • 4 meant “very typical” • 1 meant “not at all typical”. • Weiser reasonably concluded that no program was especially atypical. Dr Andy Brooks
Examples of slices Figure 6 Slices that are large in relation to the program (e.g. 563/662 statements) are less useful to the program maintainer. Dr Andy Brooks
Implications • Tools that automatically generate program slices can help maintainers debug faulty code. • Novice programmers should be taught the concept of slicing. Today, researchers study many different kinds of slicing techniques. Dynamic slicing makes use of knowledge about the input, and this can greatly reduce the size of slices. Dr Andy Brooks
Slicing or not ? “Because the relevant slice fragment overlapped the relevant sequential fragment in each program, this experiment gives no absolute assurance that relevant slices were not recognised only because of that overlap.” Table II indicates that recognition ratings between relevant slice fragments and relevant sequential fragments are poorly correlated. This suggests that participants could have been recognising relevant slice fragments because they had indeed been slicing, but... Dr Andy Brooks
Andy´s view • In experimental work it is better to directly measure than indirectly measure. • Nowadays, it is possible to build and use tools to record all user actions and so help establish if program slicing occurred or not. • Even in Weiser´s day, he could have recorded participants speaking their thoughts and actions aloud and then analysed the recordings to help establish if program slicing had occurred or not. Dr Andy Brooks
Andy´s view • At the very least, Weiser should have asked his participants at the end of the experiment what actions they performed to debug the programs. • Because the programs were so small, it is quite possible that relevant slice recognition occurred because (some or all) participants had simply read all the code involved. • It would be interesting to know what the recognition rates would have been if fragments shown to participants had not been syntactically altered. Dr Andy Brooks
You never really know what is going on inside someone´s head. Dr Andy Brooks