930 likes | 1.05k Views
reading & understanding code. experts are better at code comprehension because they focus on higher level patterns patterns can be considered “discourse rules” naming conventions, design patterns, schemas experts work significantly better when reading & writing code according to these patterns.
E N D
reading & understanding code • experts are better at code comprehension because they focus on higher level patterns • patterns can be considered “discourse rules” • naming conventions, design patterns, schemas • experts work significantly better when reading & writing code according to these patterns
reading & understanding code program comprehension expertise effects mental models tools
outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools
outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools
mental model • explanation of a someone’s thought process when carrying out a task • our someone: programmers • our task: program comprehension • several models exist
mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation
mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation
bottom-up mental models • 1st: read code statements • 2nd: chunking: group statements as abstractions • 3rd: repeat
chunking sequence chunk 1 chunk n chunk 2 element 1 element k element 2 modified from wikipedia
chunking • program model • reasoning about the order of computation, how control moves throughout a program • “control flow” • situation model • reason about how data moves through atomic models • “data flow” N. Pennington Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs Cognitive Psychology, 1987
program & situation model studies • participants first primed for either control flow or data flow • shown a piece of code, asked to recall another piece of code which is related through either control flow or data flow • participants then asked a question that relates to either control or data flow • participants primed to think about control flow answered other control-flow questions faster, same with data flow N. Pennington Stimulus Structures and Mental Representations in Expert Comprehension of Computer Programs Cognitive Psychology, 1987
types of programmer knowledge • semantic: general programming concepts • low-level knowledge, e.g. what a=1 means • high-level knowledge, e.g. sorting algorithms • syntactic: language detail • overlaps between languages • stylistic: programming conventions • “discourse rules” B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979 E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
problem statement short term memory internal semantics (working memory) program high level concepts low level concepts knowledge (long term memory) semantic knowledge syntactic knowledge high level concepts COBOL FORTRAN PL/I LISP low level concepts B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979
evidence forsemantic & syntactic knowledge • lab studies using FORTRAN • participants: programmers and non-programmers • asked to perform tasks that used one type of knowledge • six studies (will describe two) B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979
program memorization • study • two subject types: non-programmers & programmers • two program versions: normal & shuffled • participants asked to memorize a program • results • non-programmers performed equally poorly with normal & shuffled programs • programmers performed poorly with shuffled program, well with normal • were able to remember semantic details with syntactic variations • conclusion • programmers were not memorizing the program, but internal semantics to represent its function B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979
commenting • study • two program versions • 5-line high-level block comment at top • numerous interspersed low-level comments • participants asked to make modifications to program & memorize program • result • high-level comment participants performed better • strong correlation between ability to make modifications and ability to memorize • conclusion • memorization is a strong correlate to comprehension • hierarchical chunking to organize statements into a unit facilitate comprehension process B. Shneiderman and R. Mayer Syntactic/Semantic Interactions in Programmer Behavior: A Model and Experimental Results Journal of Computer & Information Sciences, 1979
mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation
mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation
top-down models • 1st: develop hypotheses about the program • 2nd: evaluate and refine hypotheses • with the help of beacons • 3rd: repeat • a process of “reconstructing knowledge”
beacons • “indexes into existing knowledge” • recognizable features in that are cues to the presence of certain structures • e.g., looking for a listener pattern M. Storey Theories, Methods, and Tools in Program Comprehension: Past, Present, and Future IEEE Workshop on Program Comprehension, 2005 R. Brooks Towards a theory of the comprehension of computer programs International J. on Man-Machine Studies, 1981
beacon types • semantic knowledge “plans” • reusable generic program fragments • high-level or low-level • programming discourse conventions • “rules” that make program comprehension easier • found across programmers E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
brooks’ model problem external representation requirement documentation program code design document match beacons beacons beacons syntactic knowledge semantic knowledge verify internal schema vs external representation internal representation –hypotheses and subgoals R. Brooks Towards a theory of the comprehension of computer programs International J. on Man-Machine Studies, 1981 modified from Jonathan I. Maletic’sslides: An Overview of Mental Models for Program Understanding
mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation
mental model classes • bottom-up • read code statement by statement then ascend for a higher-level picture • top-down • start with a high-level picture of what the code is doing then descend into code • mixed • incorporate elements from both, based on the situation
opportunistic & systematic strategies • programmers enhancing existing program • two strategies: • systematically read code in detail, tracing through control and data flow manually • developed control and data flow knowledge • focus only on code relevant to a task • developed only control flow knowledge, resulted in a weaker understanding Margaret-Anne Storey Theories, Methods, and Tools in Program Comprehension: Past, Present, and Future Int. Workshop on Program Comprehension, 2005
integrated model • maintainers switch between top-down and bottom-up comprehension • top-down if code or code type is familiar • program model (control-flow) when code is completely unfamiliar • situation model (data-flow) after a partial data-flow understanding is developed through top-down or program model methods • knowledge base: information from previous three models Margaret-Anne Storey Theories, Methods, and Tools in Program Comprehension: Past, Present, and Future Int. Workshop on Program Comprehension, 2005 A. von Mayrhauser and A.M. Vans From Program Comprehension to Tool Requirements for an Industrial Environment IEEE Workshop on Program Comprehension, 1993
validating the integrated model • taped professional maintenance programmers • worked with a large code base • classified as domain and language experts • tape transcriptions classified into model types • one of few studies with real world tasks
outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools
outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools
programming discourse rules • specify the conventions of programming • e.g., a variable’s name should reflect its function • e.g., don’t include code that won’t be used • similar to writing discourse rules, as outlined in books like Elements of Style • e.g., you expect to find the description for fig. 7 between those for fig. 6 and fig. 8 E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
rules of programming discourse • variable names should reflect function • don’t include code that won’t be used • if there is a test for a condition, then the condition must have the potential of being true • a variable that is initialized via an assignment statement should be updated via an assignment statement • don’t do double duty with code in a non-obvious way • an if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
testing discourse rules • lab study with expert & novice programmers • two program types • α (plan-like): obeyed discourse rules • β (un-plan-like): disobeyed discourse rules • participants given either α or β code, with one blank • task:fill the blank with what seems “natural” • participants were not told about α or β code • conclusion: experts fared best with α code
why have un-plan-like (β) code? • machine limitations • limited memory, processing, bandwidth, etc. • language limitations • less common. bugs, efficiency issues, etc. • programmer limitations • does not have full mastery of discourse • historical traces • resistance to changing legacy code, permanent “temporary” code source: The Psychology of Computer Programming
XXX: PROCEDURE OPTIONS(MAIN); DECLARE B(1000) FIXED(7,2), C FIXED(11,2), (I, J) FIXED BINARY; C = 0; DO I = 1 TO 10; GET LIST((B(J) DO J = 1 TO 1000)); DO J = 1 TO 1000; C = C + B(J); END; END; PUT LIST(‘RESULT IS ’, C); END XXX; modified from The Psychology of Computer Programming
XXX: PROCEDURE OPTIONS(MAIN); DECLARE A(1000) FIXED(7,2), C FIXED(11,2), I FIXED BINARY; C = 0; GET LIST((A(J) DO I = 1 TO 10000)); DO I = 1 TO 10000; C = C + B(I); END; PUT LIST(‘RESULT IS ’, C); END XXX; modified from The Psychology of Computer Programming
rules of programming discourse • variable names should reflect function • don’t include code that won’t be used • if there is a test for a condition, then the condition must have the potential of being true • a variable that is initialized via an assignment statement should be updated via an assignment statement • don’t do double duty with code in a non-obvious way • an if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
rules of programming discourse • variable names should reflect function • don’t include code that won’t be used • if there is a test for a condition, then the condition must have the potential of being true • a variable that is initialized via an assignment statement should be updated via an assignment statement • don’t do double duty with code in a non-obvious way • an if should be used when a statement body is guaranteed to be executed only once, and a while used when a statement body may need to be repeatedly executed E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
naming conventions • meaningful names • variable naming reflects cognitive structure • grammatical sensibility • interact with language spec. to form expressions • containers & paths • objects & pointers • polysemy, homonymy, & overloading • operators, name sharing B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006
naming conventions • meaningful names • variable naming reflects cognitive structure • grammatical sensibility • interact with language spec. to form expressions • containers & paths • objects & pointers • polysemy, homonymy, & overloading • operators, name sharing B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006
meaningful names • metaphors for domain tasks • e.g. pushing objects onto a stack • keywords for grouping • e.g. common prefixes & suffixes • informative names • balanced with name length A. Blackwell Metaphor or analogy: how should we see programming abstractions? Psychology of Programming Interest Group, 1996 B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006
name length • length harm readability and recall ability • idioms and memory ties improve readability and recall ability • takeaway: variable names with consistent and abbreviated vocabulary are optimal • (variable names that concisely express a metaphor) D. Binkley, D. Lawrie, S. Maex, and C. Morrell Identifier length and limited programmer memory Science of Computer Programming, 2009
grammatical sensibility • names as phrase fragments • methods as actions (change state of program) • e.g. addElement, setSize, removeAll • methods as mathematical functions (compute result, don’t alter state) • e.g. true/false: contains, equals, isEmpty • e.g. data: capacity, indexOf, size • valence cues (phrase fragments w/ open slot) • e.g. roster.contains(player) • smalltalk makes use of this extensively: • roster insert: player at: position B. Liblit, A. Begel, and E. Sweetser Cognitive Perspectives on the Role of Naming in Computer Programs Psychology of Programming Interest Group, 2006
outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools
outline • mental models • types • models • conventions & “discourse rules” • expertise effects • tool implications • interesting tools
20:1 programmer performance • Sackman et al.: best programmers are 20xbetter than worst programmers @ bug fixing • study originally meant to evaluate the effectiveness of time-shared systems H. Sackman, W. J. Erikson, and E. E. Grant Exploratory experimental studies comparing online and offline programming performance Communications of the ACM, 1968
10:1 programmer performance • there are substantial programmer efficiency differences, but not as dramatic as initially reported • what makes experts so much better at understanding code?
testing discourse rules • lab study with expert & novice programmers • two program types • α (plan-like): obeyed discourse rules • β (un-plan-like): disobeyed discourse rules • participants given either α or β code, with one blank • task:fill the blank with what seems “natural” • participants were not told about α or β code
α problem PROGRAM Magenta(input, output) VAR Max, I, Num INTEGER BEGIN Max = 0. FOR I = 1 TO 10 DO BEGIN READLN(Num) If Num Max THEN Max = Num END WRITELN(Max). END ? E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984
α solution PROGRAM Magenta(input, output) VAR Max, I, Num INTEGER BEGIN Max = 0. FOR I = 1 TO 10 DO BEGIN READLN(Num) If Num > Max THEN Max = Num END WRITELN(Max). END E. Soloway, K. Ehrlich Empirical Studies of Programming Knowledge IEEE Transactions of Software Engineering, 1984