CPS 506 Comparative Programming Languages

CPS 506Comparative Programming Languages Type Systems, Semantics and Data Types

Type Systems • A completely defined language: Defined syntax, semantics and type system • Type: A set of values and operations • int • Values=Z • Operations={+, -, *, /, mod} • Boolean • Values={true, false} • Operations={AND, OR, NOT, XOR} 2

Type Systems • Type System • A system of types and their associated variables and objects in a program • To formalize the definition of data types and their usage in a programming language • A bridge between syntax and semantics • Type checked in compile time: a part of syntax analysis • Type checked in run time: a part of semantics 3

Type Systems (con’t) • Statically Typed: each variable is associated with a single type during its life in run time. • Could be explicit or implicit declaration • Example: C and Java, Perl • Type rules are defined on abstract syntax (Static Semantics) 4

Type Systems (con’t) • Dynamically Typed: a variable type can be changed in run time • Example: LISP, JavaScript, PHP Java Script example: List = [10.2 , 3.5] … List = 47 • Less reliable, difficult to debug • More flexible • Fast compilation • Slow execution (Type checking in run-time) 5

Type Systems (con’t) • Type Error: a non well-defined operation on a variable in run time • Example: union in C union flexType { inti; float f; }; union flexType u; float x; … u.I = 10; x = u.f; … • Another example in C ? 6

Type Systems (con’t) • Strongly Typed: All type errors are detected in compile or run time before execution • More reliable • Example: Java is nearly strongly typed, but C is not x+1 regardless of the type x • Coercion (implicit type conversion) rules have an effect on strong typing • Weak type example x = 2; y = “5”; print x+y Visual Basic: 7 JavaScript: “25” 7

Type Systems (con’t) • Type Safe: A language without type error • Strongly Typed -> Type Safe • Example: Java, Haskell, and ML 8

Type Binding • The process of associating an attribute, name, location, value, or type, to an object • Example inti; Identifier i is bound to the integer type and to a location specified by the underlying compiler i = 10; Identifier i is bound to value 10or value 10 is bound to a location 9

Type Binding (con’t) • Binding time • Language definition time • Java: Integers are bound to int, and real numbers are bound to float • Language implementation time • Bounding real values to IEEE 754 standard • Program writing time • Declaration of variables • Compile/Load time • Bounding static objects to stack or fixed memory • Execution code is assigned to a memory block • Run time • Value are bound to variables 10

Type Binding (con’t) • Early binding • An element is bound to a property as early as possible • The earlier the binding the more efficient the language • Late Binding • Delay binding until the last possible time • The later the binding the more flexible the language • Supports overloading and overriding in Object Oriented languages • C++ example ? 11

Type Checking Type checking is the activity of ensuring that the operands of an operator are of compatible types A compatible type is one that is either legal for the operator, or is allowed under language rules to be implicitly converted, by compiler- generated code, to a legal type If all type bindings are static, nearly all type checking can be static If type bindings are dynamic, type checking must be dynamic 12

Type Conversion A narrowing conversion is one that converts an object to a type that cannot include all of the values of the original type e.g. float to int A widening conversion is one in which an object is converted to a type that can include at least approximations to all of the values of the original type e.g. int to float 13

Type Conversion (con’t) • Implicit type conversion (Coercion) • decreases type error detection ability. In most languages, all numeric types are coerced in expressions, using widening conversions. Ada has no implicit Conversion 14

Type Conversion (con’t) • C double d; long l; inti; … d = i; l = i; if (d == l) d = 2 * l; • Java int x; double d; x = 5; d = x + 2; 15

Type Conversion (con’t) • Explicit type conversion (Casting) • ( type-name ) cast-expression • C double d = 3.14; inti = (int) d; • Java boolean t = true; byte b = (byte) (t ? 1 : 0); • Ada (similar to function call) 3 * Integer(2.0) 2.0 + Float(2) 16

Semantic Domains • Semantic Domain • A set with well-defined properties and operations • Environment • A set of pairs <variable, location> • Memory • A set of pairs <location, value> • State • Product of environment and its memory σ = { <Var1, Val1>, <Var2, Val2>,…, <Varn, Valn>} 17

Semantic Domains (con’t) • Three ways to define the meaning of a program • Operational Semantics • Program is interpreted as a set of sequences of computational steps • A set of execution rules Premise -> Conclusion σ(x) => 4 and σ(y) => 2 -> σ(x+y) => 6 18

Semantic Domains (con’t) • Three ways to define the meaning of a program • Operational Semantics (con’t) • Usage • Language manuals and textbooks • Teaching programming languages • Structural: define program behavior in terms of the behavior of its parts • Natural: define program behavior in terms of its overall effects, and not from its single steps 19

Semantic Domains (con’t) • Axiomatic Semantics • The program does what it is supposed to do • Agreement of the program result and specification • Formal verification of a program using logic expressions, assertions • Hoare triple {Pre-condition} s {Post-condition} • Example {a = 2} b = a; {b = 2} • Weakest Pre-condition {?} a = b+1; {a > 1} 20

Semantic Domains (con’t) • Axiomatic Semantics (con’t) • Axioms • Rule of Consequence • Rule of Conjunction • Rule of Assignment (s : b = a) • Rule of sequence • Rule of Condition s : if c then a else b 21

Semantic Domains (con’t) • Axiomatic Semantics (con’t) • Axioms • Rule of Loop s : while c do b end • I is loop invariant • Loop Invariant is true before the loop, at the bottom of the loop in each iteration, and when the loop is terminated. • Find the loop invariant to prove the correctness of the loop 22

Semantic Domains (con’t) • Denotational Semantics • Define the meaning of statement as a state-transforming mathematical function • A state of a program indicates the current values of the active objects • Example • Denotational semantics of Integer arithmetic expressions • Production rules: Number ::= N D | D Digit ::= 0 | 1 | … | 9 Expression ::= E1 + E2 | E1 – E2 | E1 * E2 | E1 / E2| (E) | N 23

Semantic Domains (con’t) • Denotational Semantics (con’t) • Semantic domain: Integer = { …, -1, 0, 1, …} • Semantic functions: Value: Numner => Number Digit: Digit => Number Expr: Expression => Integer • Auxiliary functions: plus: Number + Number => Number … • Semantic equations: Expr[[E1+E2]] = plus(Expr[E1] , Expr[E2]) 24

Data Types • Elements of a data type • Set of possible values • Set of operations • Internal representation • External representation • Type information • Implicit • 5 is implicitly integer • I is integer, implicitly, in Fortran • Explicit • Using variable or function declaration 25

Data Types (con’t) • Data type classifications • Built-in • Included in the language definition • Primitive • Composite • Recursive • User-defined • Data types defined by users • Declared and defined before usage 26

Primitive Data Types • Unstructured and indivisible entities • Integer, Real, Boolean, Char • Depends to the language application domain • COBOL: fixed-length strings and fixed-point numbers • SNOBOL: Strings with different length • Scheme: integer, rational, real, complex 27

Primitive Data Types (con’t) • Example • C • int, float, char • Java • int, float, char, boolean • Pascal • Integer, Char, Real, Longint • ML • bool, real, int, word, char • Scheme • integer?, real?, boolean?, char? 28

Primitive Data Types (con’t) • Integer • Almost always an exact reflection of the hardware so the mapping is trivial • There may be as many as eight different integer types in a language • Java’s signed integer sizes: byte, short, int, long 29

Primitive Data Types (con’t) • Float • Model real numbers, but only as approximations • Languages for scientific use support at least two floating-point types (e.g., float and double; sometimes more • Usually exactly like the hardware, but not always • IEEE Floating-Point • Standard 754 30

Primitive Data Types (con’t) • Complex • Some languages support a complex type, e.g., C99, Fortran, and Python • Each value consists of two floats, the real part and the imaginary part • Literal form (in Python): (7 + 3j), where 7 is the real part and 3 is the imaginary part 31

Primitive Data Types (con’t) • Decimal • For business applications (money) • Essential to COBOL • C# offers a decimal data type • Store a fixed number of decimal digits, in coded form (BCD) (Binary-Coded Decimal) • Advantage: accuracy • Disadvantages: limited range, wastes memory 32

Primitive Data Types (con’t) • Boolean • Simplest of all • Range of values: two elements, one for “true” and one for “false” • Could be implemented as bits, but often as bytes 33

Primitive Data Types (con’t) • Character • Stored as numeric codings • Most commonly used coding: ASCII • An alternative, 16-bit coding: Unicode (UCS-2) (Universal Character Set) • Includes characters from most natural languages • Originally used in Java • C# and JavaScript also support Unicode • 32-bit Unicode (UCS-4) • Supported by Fortran, starting with 2003 34

Composite Data Types Structured or compound types Array, String, Enumeration, Pointer, Record, List, Function Homogeneous like Array Heterogeneous like Record Fixed size like Array Dynamic size like Linked List Inside the core or as a separate library 35

Composite Data Types (con’t) • Example • C • Array ([]), Pointer (*), Struct, enum • Java • String, Array • Pascal • Record, Array, Pointer (^) 36

Composite Data Types (con’t) • String • C and C++ • Not primitive • Use char arrays and a library of functions that provide operations • SNOBOL4 (a string manipulation language) • Primitive • Many operations, including elaborate pattern matching • Fortran and Python • Primitive type with assignment and several operations • Java • Primitive via the String class • Perl, JavaScript, Ruby, and PHP • Provide built-in pattern matching, using regular expressions 37

Composite Data Types (con’t) • String length option • Static: COBOL, Java’s String class • Limited Dynamic Length: C and C++ • In these languages, a special character is used to indicate the end of a string’s characters, rather than maintaining the length • Dynamic (no maximum): SNOBOL4, Perl, JavaScript • Ada supports all three string length options 38

Composite Data Types (con’t) • String Implementation • Static length: compile-time descriptor • Limited dynamic length: may need a run-time descriptor for length (but not in C and C++) • Dynamic length: need run-time descriptor; allocation/de-allocation is the biggest implementation problem 39

Composite Data Types (con’t) • Enumeration • All possible values, which are named constants, are provided in the definition • C# example enum days {mon, tue, wed, thu, fri, sat, sun}; • Design issues • Is an enumeration constant allowed to appear in more than one type definition, and if so, how is the type of an occurrence of that constant checked? • Are enumeration values coerced to integer? • Any other type coerced to an enumeration type? 40

Composite Data Types (con’t) • Enumeration (con’t) • Aid to readability, e.g. no need to code a color as a number enum Colors {Red, Blue, Green, Yellow}; • Aid to reliability, e.g. compiler can check: • operations (don’t allow colors to be added) • No enumeration variable can be assigned a value outside its defined range • Ada, C#, and Java 5.0 provide better support for enumeration than C++ because enumeration type variables in these languages are not coerced into integer types 41

Composite Data Types (con’t) • Sub-range Types • An ordered contiguous subsequence of an ordinal type • Example: 12..18 is a sub-range of integer type • Ada’s design type Days is (mon, tue, wed, thu, fri, sat, sun); subtype Weekdays is Days range mon..fri; subtype Index is Integer range 1..100; Day1: Days; Day2: Weekday; Day2 := Day1; 42

Composite Data Types (con’t) • Enumeration and Sub-range implementation • Enumeration types are implemented as integers • Sub-range types are implemented like the parent types with code inserted (by the compiler) to restrict assignments to sub-range variables 43

Composite Data Types (con’t) • Array • An array is an aggregate of homogeneous data elements in which an individual element is identified by its position in the aggregate, relative to the first element. • A heterogeneous array is one in which the elements need not be of the same type • Supported by Perl, Python, JavaScript, and Ruby 44

Composite Data Types (con’t) • Array Index Type • FORTRAN, C: integer only • Ada: integer or enumeration (includes Boolean and char) • Java: integer types only • Index range checking • C, C++, Perl, and Fortran do not specify range checking • Java, ML, C# specify range checking • In Ada, the default is to require range checking, but it can be turned off 45

Composite Data Types (con’t) • Array Initialization • C-based languages int list [] = {1, 3, 5, 7} char *names [] = {“Mike”, “Fred”,“Mary Lou”}; • Ada List : array (1..5) of Integer := (1 => 17, 3 => 34, others => 0); • Python List comprehensions list = [x ** 2 for x in range(12) if x % 3 == 0] puts [0, 9, 36, 81] in list 46

Composite Data Types (con’t) • Array Operations • APL provides the most powerful array processing operations for vectors and matrixes as well as unary operators (for example, to reverse column elements) • Ada allows array assignment but also concatenation • Python’s array assignments, but they are only reference changes. Python also supports array concatenation and element membership operations 47

Composite Data Types (con’t) • Array Operations (con’t) • Ruby also provides array concatenation • Fortran provides elemental operations because they are between pairs of array elements • For example, + operator between two arrays results in an array of the sums of the element pairs of the two arrays 48

Composite Data Types (con’t) • Rectangular and Jagged Arrays • A rectangular array is a multi-dimensioned array in which all of the rows have the same number of elements and all columns have the same number of elements • A jagged matrix has rows with varying number of elements • Possible when multi-dimensioned arrays actually appear as arrays of arrays • C, C++, and Java support jagged arrays • Fortran, Ada, and C# support rectangular arrays (C# also supports jagged arrays) 49

Composite Data Types (con’t) • Slices • A slice is some substructure of an array; nothing more than a referencing mechanism • Slices are only useful in languages that have array operations • Fortran 95 Integer, Dimension (10) :: Vector Integer, Dimension (3, 3) :: Mat Integer, Dimension (3, 3, 4) :: Cube Vector (3:6) is a four element array • Ruby supports slices with the slice method list.slice(2, 2) returns the third and fourth elements of list 50

CPS 506 Comparative Programming Languages