1.03k likes | 1.63k Views
2. These notes are intended for use by students in CS1621 at the University of Pittsburgh and no one elseThese notes are provided free of charge and may not be sold in any shape or formMaterial from these notes is obtained from various sources, including, but not limited to, the textbooks:Concepts of Programming Languages, Seventh Edition, by Robert W. Sebesta (Addison Wesley)Programming Languages, Design and Implementation, Fourth Edition, by Terrence W. Pratt and Marvin V. Zelkowitz (Prent273
E N D
1. Course Notes forCS1621 Structure of Programming LanguagesPart BByJohn C. RamirezDepartment of Computer ScienceUniversity of Pittsburgh
2. 2 These notes are intended for use by students in CS1621 at the University of Pittsburgh and no one else
These notes are provided free of charge and may not be sold in any shape or form
Material from these notes is obtained from various sources, including, but not limited to, the textbooks:
Concepts of Programming Languages, Seventh Edition, by Robert W. Sebesta (Addison Wesley)
Programming Languages, Design and Implementation, Fourth Edition, by Terrence W. Pratt and Marvin V. Zelkowitz (Prentice Hall)
Compilers Principles, Techniques, and Tools, by Aho, Sethi and Ullman (Addison Wesley)
3. 3 Expressions Expressions are vital to programs
Allow programmer to specify the calculations that computer is to perform
It is important that programmer understand how a language evaluates expressions
Things to consider:
Precedence and associativity
Order of operand evaluation
Side-effects of evaluation
Overloadings and coercions
4. 4 Expressions Precedence and Associativity
We always learn these rules for any new language
Vital to using expressions correctly
Most languages have similar precedence for the standard operators: * / then + –
But programmer needs to understand precedence and associativity for all operators, especially those that may be unusual
5. 5 Expressions Ex: boolean and relational operators
and or not < > <= >= != ==
In Pascal, the boolean operators have higher precedence than the relational operators (opposite of C++)
if x < y then writeln(‘Less’);
if x < y and y < z then writeln(‘Middle’);
Above is an error in Pascal, since the first sub-expression evaluated would be y and y
if (x < y) and (y < z) then writeln(‘Middle’);
Now it is ok
In C++
if (x < y && y < z) cout << “Middle” << endl;
This is fine in C++
6. 6 Expressions Ex: unary ++ and -- in C++
Precedence and associativity are wacky!
#include <iostream>
using namespace std;
int main()
{
unsigned int i1 = 0, i2, i3, i4, i5, j, k, m1, m2, m3, m4, m5;
j = i1++; k = ++i1;
cout << j << " " << k << endl;
i5 = i4 = i3 = i2 = i1;
m1 = i1++ + i1++ + i1++;
m2 = i2++ + ++i2 + i2++;
m3 = i3++ + ++i3 + ++i3;
m4 = ++i4 + i4++ + ++i4;
m5 = ++i5 + ++i5 + ++i5;
cout << i1 << " " << m1 << endl;
cout << i2 << " " << m2 << endl;
cout << i3 << " " << m3 << endl;
cout << i4 << " " << m4 << endl;
cout << i5 << " " << m5 << endl;
}
7. 7 Expressions Output? See plusplus.cpp – try it on different platforms
http://www.cppreference.com/operator_precedence.html
See problem in Assignment 3
Compare to plusplus.java and plusplus.pl
8. 8 Expressions In some cases, expression is ambiguous and compiler will not let you do it, or warn you about it
Ex: A ** B ** C in Ada
Must have parentheses
Ex: Mixing bitwise operators in C++
Warning to use parentheses
Sometimes you could probably figure it out, but you’re better off not trying
Ex: If more than one coercion can occur in C++
May have defined constructor and conversion fn
9. 9 Expressions Sometimes you don’t think you should care, about precedence and associativity, but you should
In math, addition and multiplication are associative and commutative
On computer, overflow can cause this to not always be the case:
floats x = 1e+30, B = 1.0/1e+30, C = 1e+30
A * B * C A * C * B
~= 1e+30 = infinity
see Overflow.cpp
F1.add(F2); F2.add(F1)
-- If F1 and F2 are from different classes, the operations may be different or perhaps not even legal
10. 10 Expressions Side-effects can also cause evaluation order problems
Expressions can involve function calls, which can change variable values
Y = f(X) + X;
Y = X + f(X);
Without side-effects, the results are the same, but if f(X) changes the value of X, the results could be different
Most languages allow reference parameters with functions
These can cause logic errors if used improperly
See side.cpp
11. 11 Expressions How to handle this?
Leave it up to the programmer, as in Pascal and C++
Limits compiler optimizations, some of which may include reordering of operations
Compiler cannot reorder if it could possibly change result
Do not allow (most) side-effects to occur, as in Ada
Ada functions cannot change parameters
Now optimizations can reorder expressions without changing result (at least due to this)
Best advice is to program in such away as to either avoid all side-effects, or to only allow them in cases where they will not affect expression evaluation
12. 12 Expressions Operator Overloading
Used in many newer high-level languages
Can be good and bad
Good:
Aids in readability and simplifies code if used correctly
Ex: New class Complex variables A, B and C
A + B + C is more clear than (A.add(B)).add(C)
Ex: String variables can be compared
if (A < B) …
is clearer than
if (A.compareTo(B) < 1) …
13. 13 Expressions Bad:
Can harm readability if used incorrectly
Ex: + defined to do multiplication
But methods could be improperly named as well
Function calls are not obvious, especially if other versions of the function exist
In C++ we could have an member function + and also a friend function + ? which is used?
Can allow some logic errors to go undetected
Ex: C++ uses / for float and integer division
If user expects a value between 0 and 1, it’s not going to happen if integer division is used
14. 14 Expressions Some languages like C++ and Ada allow programmer-defined operator overloading
Others like Java do not
Both positions have support
15. 15 Expressions Coercion and conversion
In many expressions we use more than one datatype
Mixed expressions
This seems a reasonable thing to allow
However, often the operators and functions used are defined for only a single type
In this case, to allow mixed expressions to be used, some types must be converted to other types
The differences in languages are whether these conversions should be IMPLICIT or EXPLICIT
16. 16 Expressions Explicit conversion
In this case the language allows little or no mixed expressions in the code
To allow mixing of data types, the programmer must convert through an operation of function call
Ex: Ada does not even allow mixing of floats and integers
Good:
Everything is clear – no uncertainty or ambiguity
Programmer can more easily verify correctness of programs
Easier to avoid logic errors
17. 17 Expressions Bad:
Makes language very wordy
Can be annoying, especially when the types are similar (ex. addition of integers and floats)
Implicit conversion – coercion
In this case mixed expressions are allowed, and the language coerces types where needed to allow types to match
Usually a language has some rules by which the coercions are performed
Good:
Less wordy – makes programs shorter and sometimes easier to write
18. 18 Expressions Bad:
Programs are harder to verify for correctness
It is not always clear which coercion is being done, especially when programmer-defined coercions are allowed
Can lead to logic errors in programs
Ex: In C++ expressions are always coerced if they can be
Standard rules of “promotion” for predefined types can be easily remembered
However, programmer can also define functions that will be used for coercion
Constructors for classes and conversion functions are both implicitly called if necessary
Now the rules are less clear and can lead to ambiguity and logic errors
19. 19 Expressions Consider A = B + C where A, B and C are all of different types
Any/all of the following could exist:
+ operator with two type B arguments
+ operator with two type C arguments
Constructor for type B with argument type C
Constructor for type C with argument type B
Coercion function from C to B
Coercion function from B to C
Constructor for type A with argument type B
Constructor for type A with argument type C
How does programmer know which will be used?
Should NOT assume any particular coercion will occur in this case
Here explicit coercion should be used to remove ambiguity
See coercion.cpp and rational.h
20. 20 Expressions Boolean expressions
Expressions that evaluate to TRUE or FALSE
Formed using relational operators and boolean operators
Relational operators – operators which compare values
Operands can be most primitive types and complex types as well in some cases
Boolean operators – operators used to combine boolean results
Operands must be boolean values
Exception is C/C++
21. 21 Expressions Same guidelines for precedence and associativity hold here
Know the rules for current language
Ex: Ada boolean operators and, or have the same precedence but are NON-associative when mixed with each other
if A and B or C then …
illegal in Ada – must parenthesize
Ex: C++ boolean operator && has higher precedence than ||
22. 22 Expressions Short-Circuit Evaluation
Important note (that we may not have emphasized earlier):
Operator precedence and associativity are for OPERATORS, not OPERANDS
The operators simply indicate how the operands are combined/utilized, NOT the order in which they are accessed/determined
For example: A + B + C + D
We know we first add A and B, then add C, then add D
But the VALUES for A, B, C and D could be obtained in ANY ORDER
Done to optimize execution (ex. in parallel)
23. 23 Expressions This is significant in (at least) 2 situations:
Operand evaluation produces a side-effect that changes result of subsequent operand evaluation
As we discussed previously, operand could be a function call with a reference parameter
Operand could be used/modified more than once, as with ++ example
An operand may not be even be valid if a previous operand evaluates in a certain way
Ex: if ((X != 0) && (Y/X < 1)) cout << “rational”;
Considering the && operator, if the first operand evaluates to FALSE, the second operand evaluates to a run-time error
Now if the compiler would try to do these in parallel it could cause problems
Solution is SHORT-CIRCUIT EVALUATION (SSE)
24. 24 Expressions Idea of SSE is simple:
Evaluate boolean expressions only until a final answer can be determined
For example with &&, we know that
FALSE && ANYTHING == FALSE
so we would not get the division by zero error
SSE is nice because it makes our code simpler
If we know compiler uses SSE, we can put into a single expression what otherwise would require two
25. 25 Expressions Ex: if ((X != 0) && (Y/X < 1)) cout << “rational”;
Without SSE, how would we have to write this to prevent possible run-time error?
Do on board
Drawbacks of SSE?
Now computer must evaluate operands sequentially
Slows down program execution, especially in environments with multiple CPUs
So we have safety/ease of programming vs. execution efficiency
26. 26 Expressions Solution is to offer programmer the choice
Ada uses arbitrary evaluation of operands normally
But special operators and then and or else provide short-circuit evaluation if desired
C++ and Java use SSE for && and || but arbitrary evaluation for bitwise & and |
27. 27 Expressions Assignment
Central to Imperative Languages
Gives a value to a variable
Typical syntax:
<variable> <assig. operator> <expression>
Semantics:
Compute lvalue of variable
Compute rvalue of expression
Store computed rvalue in lvalue location
28. 28 Expressions Variations
Some languages allow multiple targets
C++ and Java allow conditional targets
Wacky ?: operator
C, C++ and Java have many assignment variations for convenience
Ex: ++, +=, *=
C, C++ and Java return the rvalue as operation result
Allows assignment to be mixed within other expressions
As with many features from C, C++, this is both good and bad
29. 29 Expressions Allows shorter code in cases such as:
A = B = C
while ((ch = getchar()) != EOF)
Since it is changing the value of a variable, order of evaluation is critical
Typically associates right to left, and it is a good idea to parenthesize (as above)
Famous C/C++ bug that we mentioned before: if (x = y) is wacky!
Will ALWAYS be true if y is non-zero
Will ALWAYS be false if y is zero
Newer compilers warn you about it
Not possible in Java since if requires a boolean
Concern also must be given for overloading the assignment operator (legal in C++ and Ada)
It is possible to cause it to behave differently from what is normally expected
Care has to be taken so that it works in all cases
30. 30 Expressions Ex: Overloading = for a linked list variable
LList<myData> A, B;
// Fill B with various nodes
A = B;
If we want to use this assignment as with other assignments, we need to return the assigned result as the result of the assignment
In C++ this is typically a reference return value, so that we can cascade the operator effectively
A = (B = C); (A = B) = C;
On the left, when the assignment B = C is finished, we need the rvalue of the result
On the right, when the assignment A = B is finished, we need the lvalue of the result
Reference allows both (even though right seems silly to do)
Also, how about A = A;
If we destroy old LL before assigning new one, this could destroy the value
31. 31 Expressions One issue that you may not normally consider: How is the rvalue evaluated?
For statically typed languages, there is usually no ambiguity – expression result type must match the type of the variable
But for dynamically typed languages, it is no longer clear
Ex: in Prolog
A = 5 + 3
Since A is not necessarily an integer, 5 + 3 could be taken as a string just as reasonably as it could be taken as an arithmetic expression
See assig.pl
32. 32 Control Statements Primary types of control in imperative languages
Selection
Choose between 1 or more different actions
Iteration
Repeat an action 0 or more times
33. 33 Control Statements Selection
One-way selection
if statement exists in virtually every imperative language
Idea here is that we either execute a statement or do not
In modern languages this is achieved using an if without the optional else
Two-way selection
Now we incorporate the else with the if
34. 34 Control Statements Typical syntax:
if <condition>
<statement>
else
<statement>
Interesting issues:
Form of condition?
What kinds of statements are allowed?
Is nesting allowed and how is it interpreted?
35. 35 Control Statements Form of condition
Most languages require a boolean expression (true or false only)
C/C++ are exceptions – int values are allowed
Kinds of statements
Original FORTRAN and BASIC allowed only a single statement
This is not conducive to good programming techniques
Only way to have multiple statements is by using an unconditional branch, i.e. GO TO
36. 36 Control Statements ALGOL 60 introduced the compound statement
Now an arbitrary number of statements can be used
All newer imperative languages (and updates of older languages) either use compound statements or allow multiple statements within the if
Nesting
It logically follows that a statement within an if clause or else clause could be another if statement
Remember orthogonality
What issues occur in this case?
37. 37 Control Statements Only problem of interest is one we have already discussed
If the number of if clauses and else clauses are not equal, how are they associated?
There are two main approaches to handling this:
Use a rule (static semantics) to determine how this is handled
This is the approach taken in Pascal, C, C++ and Java
System handles the rule consistently, so there is no ambiguity, but, like rules of precedence and associativity, the programmer could forget it or make a mistake that is not caught
Can lead to logic errors
We have already seen this example
38. 38 Control Statements Use syntax to determine how it is handled
This is the approach taken in Ada, BASIC, Modula-2, ALGOL 68
Every if statement must be syntactically terminated (ex: end if)
Now an inner if clause without an else clause must still have an end if, and syntactically the outer else can only be associated with the outer if
Perl has a slightly different approach: the statement for an if MUST be a compound statement. Result is the same, since the inner if will now be within a compound statement
39. 39 Control Statements Multiple Selection
Idea is to choose from many possible options
Clearly one way of doing this is through nested if statements
Often preferable, especially if the means of selection is a series of separate boolean expressions
// Break tie for A and B in some sport
if (A beat B twice) then
A wins tie
else if (B beat A twice) then
B wins tie
else if (A scored more points than B) then
A wins tie
else if (B scored more points than A) then
B wins tie …
40. 40 Control Statements However, in some situations, the options are based on different result values of a single expression:
Ex: Menu in which user chooses an option from 1 to 5; each option causes a different action
In these instances, nested ifs could be used
In fact these are all we really need
But the nesting gets complicated, often making the statements harder to follow and making them more prone to logic errors
So many languages supply a case statement
Specifically designed for multiple alternative selection based on different results of a single expression
41. 41 Control Statements There are some interesting issues to consider here
Many are the same as for two-way selection
Text discusses them at length
A few that we will look at
What happens after the code for the matched selection is executed?
One option is to break out of the structure, continuing with the next statement after it
This makes each option mutually exclusive
This approach is taken by Algol W, Pascal, Ada
Probably the most intuitive idea – the choices are mutually exclusive by default
42. 42 Control Statements C, C++ and Java do not automatically break out after the selection has been executed
This is good and bad (as usual)
Adds flexibility
If the execution for one selection is a “superset” of another, it makes sense to allow the flow to continue within the selection statement
Causes potential logic problems
Programmer must manually add breaks
If one is missed no syntax error occurs
What happens if no match is found?
Two logical alternatives:
1. Do nothing
2. Error
43. 43 Control Statements C, C++, Java adopt the “do-nothing approach”
Seems logical that if nothing matches nothing should be done
ANSI Standard Pascal and Ada adopt the “error approach”
More reliable, since now an accidental out of range value will be detected as an error rather than just a “do nothing”
C, C++, Java, Ada, Turbo Pascal, BASIC also provide a “default” choice
Good idea to always use so you can detect an out of range value without causing a runtime or logic error
44. 44 Control Statements Iteration
Three primary types of iterative loops: conditional loops, counting loops and arbitrary loops
Conditional (logically controlled) loops
Number of iterations is determined by a boolean condition, and cannot be (usually) precalculated
ex: while (infile && valid == 1)
Note that we cannot predict when this condition will become false
45. 45 Control Statements Many languages have two versions of the conditional loop
Pretest – condition is tested prior to entering the loop body
May execute loop body 0 times
Posttest – condition is tested immediately after executing loop body
Will always execute loop body at least 1 time
Ada does not have this version
Two versions are provided for convenience – we can always simulate one loop with the other (plus some conditionals)
See loops.cpp
Clearly the difference is where each is more appropriate
46. 46 Control Statements Conditional loops are the most general kind of loops, and are really all that is needed in an imperative programming language
However, many looping applications deal with arrays and sequences of values
For convenience and efficiency it is prudent to provide a looping structure geared toward these applications
Counting Loops (counter-controlled loops)
Number of iterations determined by a control variable, an initial value, a terminal value, and an increment
47. 47 Control Statements We can (usually) precalculate the number of iterations based on the initial value, terminal value and increment
Ex: for (int i = 3; i <= N; i+=2) { …
i obtains values 3, 5, 7, …, N (or N – 1 if N is even)
For N = 31, the number of iterations equals
CEILING((TERM – INIT+1)/INCR) or
CEILING((N – 3 + 1)/2) =
CEILING((31 – 3 + 1)/2) = 15
Precalculation is nice because it allows the computer to base the loop on an iteration count (if it chooses to do so) which can be executed more quickly than conditional testing each time
48. 48 Control Statements Machine can use a register for the iteration count and not have to worry about obtaining operands for the comparisons at each iteration of the loop, something that must be done with a conditional loop
To allow precalculation and iteration counts to work, some restrictions must be made on the loop
Loop control variable cannot be altered by the programmer within the loop body
Terminal value must be calculated only one time, when loop is first entered
It will also speed things up if the loop control variable is an integer (or integral type) so no float operations are necessary
This is the approach taken in Pascal and Ada
See for.p
49. 49 Control Structures Pascal and Ada also do not allow an increment other than 1 or –1, and do not carry the value of the control variable past the end of the loop
In Pascal, the value is “officially” undefined, but in any Pascal implementation it will typically be one of two things: 1) The terminal value of the loop or 2) The terminal value + 1 or – 1. 1) typically indicates that iteration counts are being used
In Ada, the loop control variable is implicitly declared in the loop header, and becomes really undefined at the end of the loop – accessing it afterward will cause an “undeclared variable” error
This is now generally accepted as a good idea, since it reduces side-effect problems of using loop control variables that were declared and assigned elsewhere. C++ and Java both allow (but do not require) this as well
50. 50 Control Structures Attitude in Pascal and Ada is that if you want more complex iteration (ex. increment other than 1 or –1, option of changing number of iterations during the loop’s execution) you should use a while loop
C, C++ and Java have a different approach
For loop is not really a for loop in the traditional sense
It is a very general loop that can be used for any looping application
It more appropriately is a while loop with the addition of an initialization-statement and a post-body statement
51. 51 Control Statements for (init-expr; pretest-expr; post-body-expr)
Now really anything goes and the pre-test-expr and post-body-expr are evaluated for each iteration of the loop
Can certainly be used for a counting loop, as most of you have used it
Can also be used as an arbitrary loop to do more or less whatever programmer wants it to do
Added flexibility, with added danger
The usual for C, C++
see for.cpp
52. 52 "foreach" loop Newer languages also have included a "foreach" loop to iterate through data
Key difference between "for" and "foreach"
"for" iterates through indexes (typically), which can be used to access an array / collection if desired
Loop control variable is typically an integer
"foreach" iterates through the values in the collection directly
No indexing is used, at least not directly
Loop control variable is the data type we are accessing in the collection
53. 53 "foreach" loop foreach loop has its advantages and disadvantages
Advantages:
Since no counter is used, we eliminate the possibility of index out of bounds problems
We can iterate over a collection without having to know the implementation details of the collection
Allows for data hiding and improves error prevention
We will likely discuss this more when we discuss object-oriented programming
54. 54 "foreach" loop Disadvantage
When accessing an array, we may want or need the index value
Ex: What if we want to change the data in the array or reorganize it
Ex: Sorting would difficult using "foreach"
See forEach.java and foreach.pl
55. 55 Control Statements Arbitrary Loops
Now the loop is basically an infinite loop, with the programmer expected to break out of it explicitly at some point
Ada allows this with the
loop
end loop;
exit statement will break out of the loop, and can be put into an if statement
Thus we can break out of the loop from more than one place
56. 56 Control Statements Although C, C++ and Java do not explicitly have this construct, you can certainly build it by making a while or for loop an infinite loop and using the break statement to break out
while (1) // C while (true) // Java
{ {
} }
Again this feature adds flexibility, but makes code less readable and harder to debug
57. 57 Control Statements Unconditional Branching
Transfer execution from one section of code to another section of code
Commonly known as the goto
Used extensively in early languages which lacked block control structures
Ex. early FORTRAN and BASIC programs relied heavily on the goto
It was necessary then, but most modern languages contain block control structures
58. 58 Control Statements Even then computer scientists were aware of how problematic they could be
“Spaghetti code” that results is very difficult to read
Modification of one code segment can significantly impact many parts of the program – programmer must be aware of all places that can “go to” that code segment
Debugging is very difficult – it is hard to find and fix logic errors since all possible execution paths are difficult to trace
Now languages have blocks and extensive control structures
It has been shown that goto adds no functionality (i.e. nothing can be done with it that cannot be done without it)
However, many languages still have goto
59. 59 Control Statements Unrestricted goto allows code segments that normally have only one entry and exit point to have many
Ex: What happens if you jump into the middle of a procedure (what about parameters?) or a while loop (condition is skipped)
Most newer languages that have the goto have restrictions on it
Ex: Cannot jump into an inactive statement or block in Pascal
If restricted and used infrequently, can actually be useful in some languages
Ex: Pascal does not have a break statement. If an exceptional situation would case an exit from a loop, using a goto may be more readable than adding extra convoluted logic
60. 60 Control Statements Some (newer) languages do not have goto at all
Ex: Java
Allows breaks from loops
Has exception handlers
61. 61 Subprograms Subprograms
Semi-independent blocks of code with the following basic characteristics:
Only one “entry point” – the beginning of the subprograms, and execute when called:
Parameter information is passed to subprogram
Caller execution is temporarily suspended, and subprogram executes
When subprogram terminates, caller execution resumes at point directly following the subprogram call
62. 62 Subprograms What types of subprograms can we have?
Most languages have two different types, procedures and functions
Procedures can be thought of as new named statements that can supplement the predefined statements in the language
Ex: Statements to search or sort an array
Once defined, these can be used anywhere they are needed in a program
63. 63 Subprograms In order to have an effect on the overall program, a procedure needs to act on something other than just the variables local to the procedure. This can be done through:
Outputting data to the display or to a file
Altering a (relatively) global variable that will be accessed/used later by a different part of the program
Altering formal parameters such that the actual parameters in the caller are modified
This will be discussed in more detail soon
64. 64 Subprograms Functions can be thought of as code segments that calculate and return a single result
Modeled after math functions
Used within expressions, where result value is substituted for the call
The effect of functions on the overall program is the value returned by them. Thus, from an ideal (and mathematical) point of view, functions should have NO OTHER effect on the overall program
65. 65 Subprograms Should NOT modify global variables
Should NOT alter actual parameters
Naturally, both of the above are allowed in many languages
In these cases it is up to the programmer to decide how he/she wants to use functions
Again the tradeoff for the increased flexibility is the more potential for logic errors and more difficulty in debugging
C/C++/Java
Only have functions, no procedures
void functions can mimic the behavior of procedures
66. 66 Subprograms Local variables
How/when are they allocated?
Stack-dynamic:
Default in most modern imperative languages
Required for recursive calls, since memory must be associated with each call, not each subprogram
Ex: Binary Search
mid = (left + right)/2;
Many different values for mid must be able to coexist, one for each call on the run-time stack
Could not do it memory was statically allocated
67. 67 Subprograms Overhead is time for allocation and deallocation each time a subprogram is called
May not seem like a lot of time is needed, but it can add up if many calls are made in a program
Access must be indirect since actual memory location of variable will not be known until a subprogram call is made
Location in run-time stack depends upon calls made prior to current one, which can differ from run to run
Also adds some time overhead
Static:
Used in languages that do not support recursion (ex. older FORTRAN)
68. 68 Subprograms Also optional in other languages, such as C and C++
Allow variables to retain values from call to call
Remember the lifetime is the duration of the program
Ex: In CS1501 LZW algorithm writing codewords to a file, the bit buffer is static
The leftover bits are kept in the buffer for the next call
69. 69 Subprograms Parameters
Parameters are vital to subprograms
Allow information to be:
Passed IN to the subprogram
Passed OUT from the subprogram
Passed IN and OUT to and from the subprogram
When writing subprograms, programmer decides which is required for a given subprogram
70. 70 Subprograms Then programmer utilizes syntax/rules in language being used to achieve the desired option
Sometimes the syntax/rules of the language do not fit exactly with the 3 use options given
In these cases programmer must be careful to use the parameters as he/she intends
Some definitions:
Formal Parameter:
Parameter specified in the subprogram header
Only exists during duration of subprogram exec
Sometimes called "parameter"
71. 71 Subprograms Actual Parameter:
Parameter specified in call of the subprogram
May exist outside of the scope of the procedure
Sometimes called just "argument"
Rules for Formal and Actual parameters differ, as we will discuss
72. 72 Subprograms Parameter Passing Options
Pass-by-Value
Pass-by-Reference
Pass-by-Result
Pass-by-Value-Result
Pass-by-Name
You should be familiar with Pass-by-Value and Pass-by-Reference
Others may be new to you
We’ll discuss each
73. 73 Subprograms Pass-by-Value
Formal parameter is a copy of the actual parameter
i.e. get r-value of actual parameter and copy it into the formal parameter
Default in many imperative languages
Only kind used in C and Java
Used for IN parameter passing
Actual can typically be a variable, constant or expression
74. 74 Subprograms Benefit is that actual parameters cannot be altered through manipulation of the formals
Also useful in some recursive calls, since a new copy is made with each call
Problem is that copying a parameter can be quite expensive, both in terms of time and memory
Ex: Consider an object with an array of 1000 floats
Object is copied with each call to the function
If, for example, recursive calls are made, a lot of memory can be consumed very quickly
75. 75 Subprograms Implementation:
Using a run-time stack, this is straightforward
When subprogram is called, copy of actual parameter is placed into a local variable, which is stored on the run-time stack (in the activation record for the subprogram)
During subprogram execution, formal parameter is used like any other local variable for the subprogram
Only difference is that it is initialized via the actual parameter
76. 76 Subprograms Pass-by-Reference
Formal parameter is a reference to (or address of) the actual parameter variable
get l-value of actual param and copy it into the formal param, then access the actual param indirectly through the formal param
Used in Pascal (var parameters), in C (using explicit pointers) and C++ and PHP (&)
Most appropriate for IN and OUT parameter passing, but can be used for all
Actual param usually restricted to a variable
77. 77 Subprograms Benefit is that we can change or not change the actual parameter using the formal – it is up to the programmer
Also good that memory is saved – only an address is copied
Problem is that we can miss logic errors if we accidentally alter an actual parameter through the formal parameter
Also some applications (ex: some recursion) don’t work as well
We may not want change at one call to affect another call
78. 78 Subprograms Constant Reference Parameters
Developers of C++ realized that value parameters are not practical for large data objects (too much time and memory, esp. for recursive algorithms)
Reference parameters have danger of accidental side effects (when used for IN parameters)
Solution is to pass parameters by reference, but not allow them to be altered – constant reference
Now compiler gives error if parameter is changed within subprogram
Copy made if passed by reference to another sub
79. 79 Subprograms Good concept, but not perfect
Programmer can get around it by casting to a pointer and altering indirectly
See params.cpp
Ada IN parameters have a similar idea
Cannot be assigned/altered within the function
Cannot be passed by out or in out to another sub
More on Ada params shortly
Implementation:
Using run-time stack, address of actual is stored in activation record
Actual is accessed indirectly in sub through its address
80. 80 Subprograms Pass-by-Result
Reference parameters are not an exact fit for out parameters
Ex: A procedure designed to read data from a file into an object
Here we don’t care about what used to be in the object – we just want to be sure that at the end the appropriate value is assigned
With reference parameters we COULD access the old value and use it if we “wanted” to (or by mistake)
Pass-by-Result prevents this
81. 81 Subprograms In Pass-by-Result, actual parameter is not actually passed to the subprogram – it only waits to have a value passed back to it
Formal parameter is a local variable
During life of subprogram its value does not affect actual parameter at all
At end of subprogram its value is passed back to the actual parameter
So what is actually needed of actual parameter is its address (lvalue)
When address is obtained can affect result for some contrived examples
82. 82 Subprograms // Note: This is NOT real code
int A[8];
for (int i = 0; i < 8; i++) A[i] = i;
global int j = 2;
foo(A[j]);
output(A[]);
sub foo(int param)
{
int temp = 25;
j = 5;
param = temp;
}
------------------------------------------------
Output: 0 1 25 3 4 5 6 7 // if address obtained
// at call
Output: 0 1 2 3 4 25 6 7 // if obtained at ret.
83. 83 Subprograms If used, address is typically obtained at call
Ada ’83 out parameters for simple types are ALMOST this, but the formal parameter value cannot be accessed within the sub (so it is not really a local variable)
Ada ’95 changed out parameters to allow them to be accessed, fitting the Pass-By-Result model more closely
Implementation:
At sub call, actual param address is calculated and stored in run-time stack, as is the formal param (as a local)
Final result of formal is copied back to actual address at end of sub
84. 84 Subprograms Pass-by-Value-Result
Now actual parameter’s value is passed to the formal parameter when subprogram is called, being stored and used as a local variable
At the end of the subprogram the value is passed back to the actual parameter
As the name indicates, this is a combination of Pass-by-Value and Pass-by-Result
Used for IN and OUT parameters
85. 85 Subprograms If aliasing is NOT allowed/used, and if no exceptions occur in the subprogram the effect of value-result and reference is the same
Precondition: Actual parameter has value obtained previous to call
During subprogram: Only formal parameter is accessed, updated as desired
Postcondition: Actual parameter has last value assigned within subprogram
86. 86 Subprograms However if aliasing is allowed/used, there can be differences
Ex: Actual parameter is accessed directly as a global variable and is also passed to the sub as a parameter
With reference params, changes to the formal immediately change the global actual param
With value-result params, changes to the formal do not affect the global actual param (until the sub terminates)
Ada uses value-result for simple IN OUT parameters
But in Ada ’83 it is not specified how structured in out params are passed
87. 87 Subprograms Idea is that language creators did not want to require the params to be passed in any specific way
They just wanted to require the in-out effect
If the result could differ based on whether params are value-result or reference, then the program is erroneous
Up to programmer to NOT use aliases
Ada ’95 clarified, requiring all structured in-out parameters to be reference
See params.adb
Implementation:
Value + Result
88. 88 Subprograms Pass-by-Name
Definitely wackiest way of param passing
Used for IN and OUT parameters, and only in Algol
Idea is that actual parameter is textually substituted for the formal in all places that it is accessed in the subprogram
Kind of like a macro substitution
It is only evaluated at the point of use in the subprogram
Evaluated EACH TIME it is used in subprogram
89. 89 Subprograms Thus the parameter value or address could change based on where/when in the subprogram it is evaluated
However, the referencing environment used is that of the CALLER, not of the subprogram
So only changes within the subprogram that have a global effect will change its evaluation
This also makes implementation more difficult
For simple variables this is equivalent to pass-by-reference
Variable address evaluates the same way regardless of where in the subprogram it is located
90. 90 Subprograms For constant expressions, this is (almost) equivalent to pass-by-value
Evaluation of constant expr. will not change from one part of the subprogram to another
But cannot assign a new value to the formal param unless a copy is made
But it gets wacky when array elements or variable expressions are passed
Now changes within the subprogram can affect the index of the array or a variable within the expression
Can cause evaluation to differ in different parts of the subprogram
91. 91 Subprograms global int i = 0, var = 11, n = 5;
global int A[2] = {4, 8};
foo(var, 2*n, A[i]); // all pass by name
void foo(int x, int y, int z)
{
x = x + 1; output(var);
output(y); n = n + 1; output(y);
output(z); z = z + 1; output(z);
i = i + 1; z = z + 1; output(z);
} 1st: var = var + 1 ? var is 12
2nd: y is 10 ? n = n + 1 ? y is 12
3rd: z (or A[0]) is 4 ? z = z + 1 ? z is 5
4th: i is 1 ? A[1] = A[1] + 1 ? z is 91st: var = var + 1 ? var is 12
2nd: y is 10 ? n = n + 1 ? y is 12
3rd: z (or A[0]) is 4 ? z = z + 1 ? z is 5
4th: i is 1 ? A[1] = A[1] + 1 ? z is 9
92. 92 Subprograms Implementation:
It is not trivial to allow macro to be evaluated and reevaluated in environment of the caller
Parameterless subprograms called thunks are used
Thunk evaluates parameter in current state of caller’s referencing environment
Returns the resulting address or value
Clearly this is a lot of overhead
Overhead and confusing results are why this is not used in newer languages
93. 93 Subprograms Subprograms as Parameters
We allow variables as parameters so that we can access their values (or addresses) from within a subprogram
Why not allow subprograms so that we can execute them from within a subprogram?
Some languages do allow this (ex. Pascal, C++, PHP)
However, there are some issues to consider
94. 94 Subprograms Can the parameter subprogram arguments differ in form from each other?
If so, how to type check and even check the number of arguments when the subprogram is actually called?
Easiest solution is to require the arguments to all have the same form
Header of parameter subprogram must be given within the header of the subprogram it is being passed to
Scope is also an issue – what is the referencing environment of the subprogram that is being passed as a parameter? Three reasonable possibilities exist:
95. 95 Subprograms The referencing environment in which the parameter subprogram is CALLED: shallow binding
The referencing environment in which the parameter subprogram is DEFINED: deep binding
The referencing environment in which the parameter subprogram is PASSED as an argument: ad hoc binding
Note that shallow binding fits well with dynamic scoping and deep binding fits well with static scoping
96. 96 Subprograms Pascal and C++ both use deep binding
Shallow binding is used by SNOBOL, which also uses dynamic scoping
Ad hoc binding has never been used
See fnparams.cpp
97. 97 Subprograms Overloading (ad hoc polymorphism)
Using the same subprogram name with different parameter lists
When a subprogram is called, the compiler selects the correct version based on the parameter lists
In Ada, return type for a function is also used, since coercion is not done in Ada and function return values cannot be ignored
Enables programmer to use the same name for similar functions that take different argument types
98. 98 Subprograms Use: Make it easier for the programmer to use consistent names for subprograms
Without overloading: Programmer must make up different but similar names for subprograms that do similar things but for different types
Ex: abs(int) fabs(float) labs(long)
Ex: ISort(int * A) FSort(float * A)
With overloading: Programmer uses the same name and the compiler decides which to use
Ex: abs(int) abs(float) abs(long)
Ex: Sort(int * A) Sort(float * A)
99. 99 Subprograms But programmer must be careful:
Ada and C++ both allow overloading and default parameters
Leaving out some parameters in the call could make a call ambiguous
i.e. it matches more than one function header
Call can also be ambiguous if implicit casting of arguments is done
Operator Overloading is the same idea, but with symbols rather than identifiers
We discussed these issues previously
See Slide 12 of cs1621b.ppt
100. 100 Generics Generics
Parametric polymorphism
One or more parameters are passed to a subprogram when it is instantiated (i.e. when the code is generated) indicating the types that will be used for the parameters in the subprogram call
Can also be used in conjunction with packages (Ada) and classes (C++)
Thus a single subprogram declaration can be used to generate many different callable subprograms, all with the same functionality
101. 101 Generics Motivation:
Programmers often apply data structures and algorithms to more than one data type
Ex. Sorting, Searching algos
Ex. BST, PQ, Stack, Queue data structures
Even with overloading, the programmer must still write different (identical except for type) versions of the code
Generics simply transfer the job of making the different versions from the programmer to the compiler – automates the overloading process
Note that DIFFERENT VERSIONS of the code MUST STILL BE generated
102. 102 Generics So the reason we have generics is to save the programmer some time (and perhaps some confusion)
Ada vs. C++:
In Ada, template instantiations must be explicit
Programmer specifies template arguments using the new statement
Ex: package int_io is new integer_io(integer);
The generic package is integer_io
The instantiated package is int_io
The type argument is integer
As is usual in Ada, if declaration is explicit, there will be no surprises
103. 103 Generics In C++, template instantiations can be explicit or implicit
Implicit: generated automatically by the compiler when a call is seen with the appropriate arguments
“Duplicate” instantiations are merged into a single code segment
Coercion cannot be done, since the types won’t match the template correctly
Saves programmer some typing
Explicit: programmer declares each version
Coercion can be done using regular C++ promotion and conversion rules
Programmer is aware of each version
See template.cpp and tordlist.h
104. 104 Generics Java Generics
In Java 1.5 "generics" were added to the language
It is somewhat misleading, since generic abilities were always built into the Java language
Collections were defined in terms of class Object, which is the superclass to other Java classes
They could be used to store any Java class
105. 105 Generics However, retrieving objects back from the collection required explicit casting to the actual type if we wanted full access to them
ArrayList A = new ArrayList();
A.add(new String("Wacky"));
String S = (String) A.remove(0);
Also any typing mistakes (mixing types in the collection unintentionally) could only be caught at run-time (via casting exceptions)
Overall not bad, but some people thought type parameters should be allowed
106. 106 Generics JDK 1.5 added syntax very similar to that for C++ templates
However, it is very different from C++ templates (and Ada generics as well)
It is not really adding any new generic abilities to the language
It is not creating new code for each version of the class or method
It is designed to make collections of objects more type-safe
See more details in the handout
107. 107 Implementing Subprograms What is involved when a subprogram is called, during its execution, and when it terminates?
This will differ depending on if recursion is allowed in a language or not
Most modern languages allow recursion, but original FORTRAN (up to FORTRAN 77) did not allow it
108. 108 Implementing Subprograms FORTRAN 77 (and before)
All variables within a subprogram were static, and recursive calls were not allowed
Activation records were still used, but they also could be static
Since all data was static, the size was known at compile time
Run-time stack not needed, since at most one call per sub could be performed at a time
What do we need to know when a subprogram is called?
109. 109 Implementing Subprograms Return Value
Local Variables
Parameters
Return Address If sub is a function
Static
Like local variables that are initialized
Where to go back to when subprogram ends
110. 110 Implementing Subprograms C, C++ and Java
To allow for recursive calls, a run-time stack is used
Multiple activations of the same subprogram can co-exist
Each needs its own copy of parameters and local variables
But subs are not allowed to be directly nested
The only non-local variables that need to be accessed are global variables
However, inner classes allow a nesting of sorts
111. 111 Implementing Subprograms So the activation record looks similar to that used in FORTRAN
With additional link location to access global variables
Now multiple instances of an activation record can occur at the same time, so they must be created dynamically (at run-time), unlike in FORTRAN
Let’s look at some of the contents of an activation record
112. 112 Implementing Subprograms Temporaries
Local Variables
Parameters
Dynamic Link to previous call
Static Link to Non-Locals
Return Address Temps and local variables are allocated within the subprog. call. In Pascal, C and C++, the local variables must be of fixed size. In Ada, they can be variable size (ex. arrays)
Parameters, links to non-Locals and the return address are placed into the AR by the caller of the subprogram, so they are lower in the record
113. 113 Implementing Subprograms See rtstack.cpp
Accessing non-local variables within a subprogram
Local variables are located within the activation record (AR)
Can be accessed by knowing the base address of the AR plus a local_offset for each variable
Ex: Base address of AR = 162
int x, y[5]; // address of x is 162 + (other AR stuff)
float z; // address of z is 162 + (other AR stuff)
// + 4 + 20
114. 114 Implementing Subprograms Non-locals are located elsewhere
For languages like C and C++:
Subprograms cannot be nested
Besides locals there are global variables
For languages like Ada and Pascal:
Subprograms can be nested to arbitrary depth
A sub can be declared within a sub, which is within a sub, which is within a sub …
Using static scope, variables declared in a textual parent sub are accessible from an inner sub
Relative global variables
But the variable locations could be in different places on the run-time stack
How to find them?
115. 115 Implementing Subprograms What do we need to do?
Locate the AR that contains the nonlocal
Find where in the AR the variable is located
Finding where in the AR to look is the same as for local variables
Keep track of a local_offset value for the variable
Locating the AR is a different story
May not be directly prior to current AR
116. 116 Implementing Subprograms Two techniques used to locate AR
Static links
A link is kept in an AR to that AR’s textual parent (from the declaration)
To access a single nonlocal many links may be crossed
Display
A single array is kept to indicate all of the currently accessible nested subs
Any nonlocal can be accessed with two indirect accesses
117. 117 Implementing Subprograms Static Links
Due to rules of static scope, if a subprogram is called, its textual parent subprogram MUST be active
118. 118 Implementing Subprograms However, textual parent does NOT have to be previous call on run-time stack
So dynamic link in AR is not enough (but would work for dynamic scoping)
119. 119 Implementing Subprograms Static links connect an AR to the AR of the subs textual parent, no matter where previously on the RT stack it is
How is this used to access nonlocal variables?
Can be determined and maintained based on the nesting depths of the subprograms that are called
The difference in the nesting depths between the sub using a nonlocal variable and the sub in which the nonlocal is declared is equal to the number of static links that must be crossed to find the correct AR for the variable
120. 120 Implementing Subprograms This difference can be stored for each variable when the program is compiled, so that at run-time finding the variable is simple
121. 121 Implementing Subprograms What actually happens when a sub is called?
AR for textual parent of sub must be located on the run-time stack, so that the static link can be linked to it
A clear (but inefficient) way to do this is to follow dynamic links down the RTS until the AR for the parent sub is found
A better way can take advantage of the fact that the calling sub and the called sub must be “relatives” in the declaration tree
Calling sub could be parent of called sub (but not grandparent)
Calling sub could be called sub (direct recursion)
Calling sub could be a sibling of called sub
Calling sub could be a descendent of called sub (indirect recursion)
Calling sub could be a “niece” of called sub
122. 122 Implementing Subprograms So instead of following dynamic links, at compile-time we can pre-calculate the number of static links (from caller) to follow to find the appropriate textual parent AR
Always equal to: nesting_depth (calling sub) – nesting_depth(called sub) + 1
Calling sub could be parent of called sub
X – (X+1) + 1 = 0 static links (user caller's AR)
Calling sub could be called sub (direct recursion)
X – X + 1 = 1 static link – same textual parent
Calling sub could be a sibling of called sub
X – X + 1 = 1 static link – same textual parent
Calling sub could be a descendent of called sub (indirect recursion)
Calling sub could be a “niece” of called sub
Follow diff. in nesting depth + 1 static links
123. 123 Implementing Subprogams procedure Bigsub is
procedure A(Flag: Boolean) is
procedure B is
...
A(false);
end; -- B
begin -- A
if flag
then B;
else C;
end; -- A
procedure C is
procedure D is
? here
end; -- D
...
D;
end; -- C
begin -- Bigsub
A(true);
end; -- Bigsub Problem 3 in Chapter 10Problem 3 in Chapter 10
124. 124 Implementing Subprograms Evaluation of static links
Maintaining is not too time-consuming
Chain offsets can be calculated at compile time
Local variables can be accessed directly
Non-locals must follow 1 or more static links
Works well if nesting depths do not get too deep
For deep sub nesting, cost of non-local access can be high
But usually 2 or 3 levels is max used
125. 125 Implementing Subprograms Display
Uses a single array to store links to ARs at all relevant nesting depths
To access a nonlocal at a given nesting depth, we just follow the display entry for that depth, then the local_offset
Never more than one link to follow
Array is updated as subs are called and as they terminate
Generally faster than static links if many nesting levels are used
We will skip the details here – read the text
126. 126 Implementing Subprograms Nested declaration blocks
Idea could be similar to nested subs
Blocks could be treated as parameterless subs
Static links could be used to determine textual parent
But it is actually much easier to handle, since block entry and exit is always the same
Parent block goes to child block
When child block terminates, we revert to parent block
127. 127 Implementing Subprograms Simply push new block declarations onto run-time stack, and pop them when block terminates
But we only have one activation record, so no links are required
"Non-locals" can be accessed just like locals
128. 128 Implementing Subprograms Dynamic Scoping
When a non-local variable is accessed, we always follow the dynamic links until the correct declaration is found
Clearly could differ depending upon call sequence
But the mechanics are actually simple
ARs must store names of local variables so we know where to stop the search
In static scoping the names are not needed – just the offsets
129. 129 Data Abstraction Procedural (process) abstraction:
Action can be performed without requiring detailed knowledge of how it is performed
Data abstraction:
New type can be used without required detailed knowledge of how it is implemented
We don't need to know the details of how it is stored in memory
We don't need to know the details of how it is manipulated via operations
130. 130 Data Abstraction More formally, an ADT must satisfy two conditions:
The declarations of the type and operations (interface) are contained in a single syntactic unit ? ENCAPSULATION
The interface does not depend on how the objects are represented or how the operations are implemented
The representation of the objects is hidden from users of the ADT ? DATA HIDING
Objects can only be manipulated via the provided interface
131. 131 Data Abstraction Ex: Stack
Data: something that can store and access multiple data values in the manner dictated by the operations
Operations:
Push – add new value to top of stack
Pop – remove top value from stack
Top – view top value (or a copy) without removing
Empty – is stack empty
User of stack only needs to know the parameters and effect of each operation to use a stack correctly
Implementation could be an array, a linked-list, or maybe something different
Does not affect use
Implementer can “hide” these details from the user through private declarations
132. 132 Data Abstraction The idea of data abstraction was not always supported by programming languages
Ex: FORTRAN, Pascal, C did not fully support either encapsulation or data hiding
When learning good programming style, users tried to "simulate" data abstraction
Logically group type definitions, procedures and functions together as a unit
Only access the data type via the procedures and functions
Naturally, this was at the programmer's discretion
See ADT.p
133. 133 Data Abstraction Newer languages added true data abstraction
Ada via packages
C++, Java, C#, Ada95 via classes / objects
Encapsulation units that contain all details of the new type
Access modifiers that prevent access to internal details of the ADT from outside the encapsulation unit
See text for more details
134. 134 Object-Oriented Programming (OOP) Characteristics of OOP
Data abstraction: encapsulation + information-hiding
The operations for manipulating data are considered to be part of the data type (encapsulated)
The implementation details of the data type (both the structure of the data and the implementation of the operations) are separate from their specifications and (possibly) hidden from the user
As we discussed with ADTs
135. 135 OOP Inheritance
The characteristics of an ADT (data + operations) can be passed on to a subtype
Subtype can also add new data and operations
Allows programmer to build new (derived) types from old (parent) ones
Common data/operations do not have to be rewritten (or copied)
Operations that are slightly different in derived type can be rewritten (overridden) for that type
New data/operations tailor the derived type to the problem at hand
Parent type is unchanged and may (sometimes) be used together with derived type
136. 136 OOP Ex: Shape class
Has data: CenterX, CenterY
Has operations AREA, DRAW
Subclasses: Rectangle, Circle, Triangle
Each subclass inherits the data and operations from the Shape class
Rectangle adds data: length, width
Rectangle overrides AREA = length * width, and DRAW in appropriate way for a rectangle
Subclass of Rectangle: Square
Guarantees that length == width
Similar ideas for Circle and Triangle
137. 137 OOP Polymorphism
Variables of a parent class can also be assigned objects of a subclass (or subclass of a subclass)
Operations used with a variable are based upon the class of the object currently stored (could be a parent type object or a derived type object)
Operations may have been overridden in the derived class
Dynamic binding allows parent and derived objects to be used together in a logical way
138. 138 OOP Ex: Shape class
We could declare:
Shape shapelist[100];
…
shapelist[0] = new Rectangle(0, 0, 10, 20);
shapelist[1] = new Square(50, 100, 30, 30);
shapelist[2] = new Circle(100, 50, 25);
for (int i = 0; i < 3; i++)
shapelist[i].Draw();
Polymorphism allows these different objects to be accessed consistently within the same array
Think about how you could do the code above in C or Pascal
It would not be easy!
139. 139 OOP One option: Make one giant struct or record to contain all of the data, including a union or variant
“Base” class would use only the core data items
“Derived” classes would use additional data items as provided in the union or variant
To do the operations, we would need a switch or case to test which type the variable is, so that it can be written out appropriately
Now what if we want to add another new derived class, Pentagon?
With OOP, it is simple to add any new data and override the necessary operations
Without OOP we would have to change the overall structure of the data and operations – old types would change, possibly causing problems
140. 140 OOP OO Languages
Smalltalk was the first and “purest” OOL
All data (even numeric literals) are objects, and are all descendents of class Object
Objects are all allocated from the heap, and implicitly deallocated (garbage collection)
Variables are references, with implicit dereferencing
Execution of a program (logically) involves objects sending messages to each other, executing methods, and responding back
So the data is driving the execution, not the control statements
141. 141 OOP Smalltalk example to count letters in an input string
| data ctr letters |
data := Prompter prompt: 'Enter your name' default:''.
ctr := 1. letters := 0.
[ctr <= data size]
whileTrue: [
(data at: ctr) isLetter
ifTrue: [ letters := letters + 1 ].
ctr := ctr + 1. ].
letters printNl.
Note variables are not typed
Only type checking is that message sent to the object is recognized
Even blocks [] are objects
Evaluated when appropriate methods are called
142. 142 OOP Consider the “while loop” below
[ctr <= data size]
whileTrue: [
(data at: ctr) isLetter
ifTrue: [ letters := letters + 1 ].
ctr := ctr + 1. ].
Semantics of this loop are as follows:
whileTrue: is a message sent to the top block, with the second block as a parameter
The top block executes a method corresponding to the whileTrue: message that does the following:
Evaluates the top block
If true, evaluates the parameter block
If false, exits the method
This propagation of messages can sometimes lead to very short code, if variables are eliminated
143. 143 OOP Equivalent to previous code:
| letters |
letters := 0.
(Prompter prompt: 'Enter your name' default:'')
do: [ :c | c isLetter
ifTrue: [ letters := letters + 1 ].
].
letters printNl.
Now we cascade the messages to allow fewer statements (also do: loop iterates through characters in a string, so we don’t need the loop counter
(((Prompter prompt: 'Enter your name' default:'')
select: [ :c | c isLetter ]) size printNl.
Now the select: loop generates a string based on the condition in the block
144. 144 OOP More on Smalltalk (classes and objects)
Data in an object can be an instance variable or a class variable
Instance variables are associated with objects
Separate data for each object
Accessible only through the methods defined for that object – always private to the class
Class variables are associated with classes
Shared data for all objects of the same class
Accessible from all objects, but still private to the class
Methods have a similar grouping, but are public
Instance methods associated with objects
Class methods associated with entire class
145. 145 OOP More on Smalltalk (inheritance)
Object base class of all others
Only single inheritance allowed
All inheritance is implementation inheritance
Data and methods of parent class are always accessible to the derived class
i.e. Cannot hide implementation details from derived class
Advantage: Derived class can likely implement its methods more efficiently with access to parent data
Disadvantage: Change in parent class implementation will likely require change in derived class implementation
Ex. Traversable stack
146. 146 OOP More on Smalltalk (polymorphism)
All messages are dynamically bound to methods
At run-time, when a message is received, the object’s class is searched for a method, then, if necessary its superclass, its super-superclass and so on up to Object
Variables have no types since they are only used to refer to objects, not to determine the messages an object can receive
Clearly some liabilities with this approach
Slows language down due to run-time overhead
Programmer type errors cannot be caught until execution time
147. 147 OOP Let's look at some examples:
person.cls as an example of a new class
See personTest.st
student.cls as an example of a subclass
studentTest.st as an example showing polymorphic access
twodarry.cls as another subclass example
See twodTest.st
For more information, see the GNU Smalltalk User's Guide:
http://www.gnu.org/software/smalltalk/gst-manual/gst.html
148. 148 OOP C++ is an imperative/OO mix
Had to be backward compatible with C
Wanted to add object-oriented features
Result is that programmer can use as few or as many OO features as he/she wants to
C++ Classes and Objects
Can be static, stack-dynamic or heap-dynamic
Member data and member functions can be private, protected or public
Allows programmer to decide
Like Smalltalk, has notion of class variables
Delcared as static in C++
Destructor needed if object uses dynamic memory
149. 149 OOP C++ Inheritance
Do not need a superclass (no Object base class for all other classes)
Multiple inheritance is allowed
Complex and difficult to use
Implementation inheritance or interface inheritance are allowed
With interface inheritance, all data and functions are still inherited, but only public ones are directly accessible to the derived class
Advantage: Modifications to parent class do not affect derived class, as long as they do not change the interface
Disadvantage: Operations may be slower, since they cannot access the data directly
150. 150 OOP C++ polymorphism
By default all functions are statically bound
Recall that this allows faster execution, a goal of the C++ language
However true polymorphism can not be utilized with statically bound functions
Dynamic binding is enabled by using virtual functions and pointers (or references)
This tells the compiler not to bind the function name to the code until run-time
Abstract base classes can be created with pure virtual functions
Not implemented in the base class
See poly.cpp
151. 151 OOP Java falls in between Smalltalk and C++
Like Smalltalk:
Object is base class to other classes
Single inheritance only
Objects are (almost) all dynamic, with garbage collection
References used to access
Method names are (by default) dynamically bound
Like C++:
Access can be private, public or protected
Static binding can optionally be used to improve run-time speed
Overall syntax for member data and function access
Variables are typed
152. 152 OOP Other Java OOP features:
Interfaces allow for a simplified form of multiple inheritance
An interface is in a sense a base class with no data and only abstract (pure virtual) methods
A class that implements an interface simply implements the methods specified therein
Advantages: Objects that implement an interface can be used whereever the interface is specified. This allows for a type of generic behavior
Ex: Comparable interface, Runnable interface
Disadvantage: Can become complicated when interfaces and inheritance are both used
Reflection that allows us to manipulate the classes themselves
See poly.java
153. 153 OOP OOL Implementation
Data:
Typically a record/struct type of storage is used – Class Instance Record (CIR)
Data members are accessed by name, in the same way as records
Subclass adds extra data to CIR of parent class
Private access enforced by limiting visibility of the data
154. 154 OOP Subprograms:
Static binding
Subprograms that will be called are determined by the variable type
Variable types are known at compile time and code can be determined then
Dynamic Binding:
Subprograms that will be called are determined by the object’s type, not the variable’s type
Objects stored in a variable are determined at run time
Appropriate links must be stored with the object
But they are the same for all objects of that class
Virtual Method Table (VMT) used to store links to all pertinent subprograms
155. 155 Parallelism Parallelism is incorporated into programs for 2 primary reasons:
Program is running in a multiprocessing or distributed environment
Many computers now have multiple CPUs
Many jobs are distributed over multiple computers in a network
A programming language should be able to take advantage of this parallelism
Many algorithms can be improved if designed for parallel execution
This is PHYSICAL PARALLELISM
156. 156 Parallelism Program is running in a “simulated” parallel environment, allowing for asynchronous activity
Ex: Two windows are displayed to the user. One shows the current time (incremented by seconds) and one allows the user to draw images on the screen
We don’t want the act of the user drawing to “stop” the clock
We don’t want the clock running to prevent the user from drawing
Even with a single processor, we want both of these activities to execute “in parallel”
This is LOGICAL PARALLELISM
157. 157 Parallelism What issues must we be concerned with?
Synchronization
Execution of tasks in parallel causes them to be asynchronous
Cannot predict at what point in time one task will execute an instruction relative to another task
If the tasks are independent, this is not a problem
No resources are shared, so it doesn’t matter where in the execution each task is
Ex: One task to count ballots from Florida, one task to count ballots from New Mexico
158. 158 Parallelism If the tasks have some dependencies, there can be a problem
Most common dependency is shared data
To handle this we must synchronize the tasks
Cooperation Synchronization
One task is dependent upon an output/outcome of another
Ex: Task B must process data produced by Task A
Contractor B cannot put up drywall until contractor A has finished the wiring
Task to count ballots cannot proceed until task that collects ballots provides it with some
We must have a mechanism that allows Task B to pause until the data is available
B could loop and keep checking for data
B could wait for some signal from A
159. 159 Parallelism Competition Synchronization
Both tasks are competing for the same shared resource
If one or both tasks modify the data, it could cause data inconsistencies
Ex: Task A and Task B are MAC machine accesses of the same bank account
Task A checks the balance: $200
Task B checks the balance: $200
Task A withdraws $200
Task A updates balance to $0
Task B withdraws $200
Task B updates balance to $-200
We must have some mechanism that ensures MUTUAL EXCLUSION for CRITICAL DATA
We could have a LOCK on the data, or a similar mechanism allowing only one task to access it at a time
160. 160 Parallelism Synchronization Mechanisms
Semaphores
Devised by Dijkstra
Basically guards that are placed around code
P must succeed to gain access to code
Decrements a counter when it succeeds
V executes when critical section ends
Based on initial value of counter, we can control how many tasks are allowed to access the critical section at once
If used properly, can guarantee either cooperation or competition synchronization
However, it is easy to NOT use them properly
Can cause problems
161. 161 Parallelism Monitors
Devised by Hansen and Hoare
Critical data section is part of a data object that allows only one task entry at a time
Better than semaphores for competition synchronization, because mechanism is built into the monitor
Harder to programmer to mess up
No better for cooperation synchronization
Still must be done manually
Used in Concurrent Pascal, Modula-2 and (somewhat) in Java
162. 162 Parallelism Message Passing
Proposed by Hansen and Hoare
More general than either of the two previous techniques
Tasks are synchronized via messages sent to each other
Message is similar in look/execution to a subprogram call, but with restrictions:
Caller (or passer) of the message is blocked at the call until the receiver is ready to receive it
Receiver (or executer) of the message is blocked at the message code until the message is called
Caller and Receiver meet at a rendezvous
163. 163 Parallelism Idea is that we know exactly where in the code both tasks will be when a rendezvous occurs
So even though tasks execute asynchronously, we synchronize them with respect to each other at a rendezvous
Ex: Ada
Still much of the work is up to the programmer
164. 164 Parallelism Parallel processing concerns
Data consistency
We have already discussed this
Mutual exclusion is needed to prevent multiple tasks from accessing critical data at the same time
However, efforts to ensure data consistency can cause other problems, such as DEADLOCK and STARVATION
165. 165 Parallelism Deadlock
When a (shared) resource has restricted access, it can cause a task to stop execution
Wait in a semaphore queue
Wait in a monitor queue
Wait in an accept queue
If a circular resource dependency exists, we can get deadlock
Ex:
Task A has acquired binary semaphore S1
Task B has acquired binary semaphore S2
Task A is waiting for binary semaphore S2
Task B is waiting for binary semaphore S1
166. 166 Parallelism Starvation
To combat deadlock, most languages allow a task to release a resource prematurely in some circumstances
Ex: If one of the Tasks in the previous example release the semaphore, the other can proceed
Under these circumstances there is the possibility that a task may never acquire all of the resources that it needs at the time it needs them – starvation
We must be careful to avoid all of these problems when programming in parallel
167. 167 Parallelism Let’s look at Java as an example:
Deadlock: see deadlock.java
Corrupt data: see corrupt.java
Some features of older Java impls are now deprecated because they are too prone to deadlock and starvation problems
Suspend / Resume
Does not free locked objects
Can easily lead to deadlock if not resumed
Stop
Immediately frees locked objects
Can lead to data inconsistency
168. 168 Prolog As we discussed previously, Prolog is a language used for logic programming
"Programs" in Prolog consist of facts and rules in a database
Facts consist of an identifier followed by a comma separated list of objects (atoms) followed by a period
The identifier represents some relationship amongst the objects, and is called a predicate
The objects are the arguments
Ex. from ex1.pl:
father(herb, irving).
169. 169 Prolog Rules are predicates that consist of a head and a body
In order for the head to "succeed" in its evaluation, all of the goals in the body must be satisfied
These goals could be facts, or could be other rules
Ex from ex1.pl:
sibling(X,Y) :- X \== Y, parent(P,X), parent(P,Y).
The :- can be thought of as "if"
Execution of a program is in fact a sequence of questions, or assertions
Database is searched in an effort to satisfy all of the assertions
170. 170 Prolog If assertions can be satisfied, answer is yes
Otherwise, answer is no
If a given assertion succeeds, execution proceeds to the next one
If a given assertion fails, execution backtracks and attempts to re-satisfy the previous assertion
So what about variable assignments?
These are in fact just side effects that occur in an effort to satisfy the query
In fact variables are not assigned in the traditional (imperative language) sense
171. 171 Prolog Variables in Prolog are dynamically typed and have two states:
Uninstantiated:
Variable is not associated with a value
Instantiated
Variable is associated with a value
Once a variable is instantiated, it keeps that value, and all occurrences of that variable within the same scope have that value
Cannot be re-assigned in sense of imperative languages
However, if execution backtracks past the point at which it was instantiated, it can again become uninstantiated
Let's look again at ex1.pl
172. 172 Prolog Recursion and database search
Recursion is a fundamental part of programming in prolog
Execution is simply satisfaction of goals, and there are no loops as in imperative languages
Thus, to build complex "programs" we must utilize recursive programming
Each attempt to satisfy a goal initiates a search of the database
173. 173 Prolog By default the DB is searched from top to bottom
We can take advantage of this in our programs
Ex: put the base case before the recursive case, so we don't have to explicitly test for it
Although, as the text points out, this could be considered to be a flaw in the language, since the order that the rules are considered should not matter to the "truth" of the logic
174. 174 Prolog If a subgoal in a rule fails at any point, we backtrack and attempt to resatisfy a previously satisfied subgoal
When resatisfying a subgoal, the db search resumes from the point at which it succeeded the first time
See recurse.pl
175. 175 Prolog Lists As in Lisp, the list is an important data structure in Prolog
A list consists of a head and a tail
Tail could be the empty list