650 likes | 671 Views
This lecture explores data concepts, including data versus information, data typing, symbols and referencing, and values. It emphasizes the importance of understanding data in programming and problem solving.
E N D
Structured Program Development& Program Control 60-140 Lecture 2a Dr. Robert D. Kent
Lecture 2: Outline • Data concepts • Operator basics
2A : Data Concepts Types, symbols and values
Lecture 2A: Outline • Data concepts • Data versus Information • Data Typing • Symbols and Referencing • Values
Data Concepts • Computers are specialized tools (hardware) built to process data using components (instruction logic) designed to perform specific (well-defined) transformations • Instructions are simply bit-strings (0’s and 1’s) that encode the • Type of operation (eg. +, -, *, =) • Location(s) of values to be operated on (or values embedded within, or implied by, the instruction itself) • Operand data are bit-strings that encode values according to specified representations that computer hardware (ALU) can operate on “meaningfully”
Data Concepts • In order to really understand programming it is necessary to appreciate both data and logic • The same is true of problem solving in general, but we often take an intuitive view of Data and focus on Process • Data may present limitations or obstacles to problem solving • Data representation is problem dependent and therefore requires special consideration • With computer hardware, there may be significant performance differences between similar operations on different data types (eg. Integer versus Real)
Data versus Information • Information is a human conceptualization that is much broader than Data. • Data (singular: datum) refers to value in a measurement system • EX> Three meters • Data: Three System: Metric Length • EX> 100 stone • Data: 100 System: (Brit.) Weight • Is it meaningful to ask – what is the total of Three meters and 100 stone?
Data versus Information • Is it meaningful to ask – what is the total of Three meters and 100 stone? • NO! • Clearly, if we ignore the context of the values Three and 100, we can just add numbers • But, the result is meaningless because it lacks a cogent informational content • Data alone, without information (context) is typically meaningless • Operations on data must always be designed carefully to account for context (ie. Information)
Data versus Information • Another example: • Imagine a time (~0 BCE/AD) in Italy when two owners of goats decide to combine their herds into one for a common business • At the time of merger, each must count their own goats (a labour intensive task, using fingers, sticks and the Roman numbering system) • One has MXXVII goats, the other DCCCXLIII goats • What is the total number of goats? • The notion (concept) of TOTAL (or sum) is not at issue – both goat herders understand this concept • What is difficult is how to calculate the value of the Total without having to merge the herds into a single pen and then count them all again, starting at one (I).
Data versus Information DXXVII plus DCCCXLIII Now, think about how many different kinds of mental operations you have performed – translation, organization, representational formatting, addition ! This is more about handling information than simply data alone. Five Hundred Twenty Seven Plus Eight Hundred Forty Three Five Two Seven Plus Eight Four Three Courtesy of arabic insights in mathematics Five Two Seven Plus Eight Four Three Equals One Three Seven Zero 527 + 843 1370 Five Hundred Twenty Seven Plus Eight Hundred Forty Three Equals One Thousand Three Hundred Seventy DXXVII plus DCCCXLIII equals MCCCLXX
Data versus Information • Now we know how to tell the goats from the sheep • Lessons Learned? • Computers, through logic, do exactly what programmers tell them to do • Most errors are due to mistaking information for data and leaving out essential aspects of logic
Data Typing • Data can be grouped into types according to the context of the values used • Integers are used to count whole (ie. complete) things • 1 person, 4 balls, 12 moons • Real numbers are used to describe both integer and fractional portions of wholes • Pi = 3.14159 (approx) is the ratio of the circle circumference and its diameter • The average number of children per Canadian family is 1.4 • The set of integers forms a proper subset of the set of real numbers.
Data Typing • Other types of data can be constructed using the mathematical concept of mapping (a type of transformational logic) • Ordinal sequencing is the simplest form of usage • Characters can be organized into sequences • Lower Case Alphabetic: a, b, c, ..., z • Upper Case Alphabetic: A, B, C, ... Z • Digits : 0, 1, 2, ... 9 • Punctuation : { , . / ! ? ; : ‘ “ [ ] ( ) } $ @ _ • Operators : < > = * & % ^ - + • And other special symbols
Data Typing • The organization of character sequences has several forms • First developed by Hollerith (still used in Fortran) • BCD and EBCDIC • ASCII (7 bit and 8 bit) • UniCode • Although we will not require knowledge (ie. memorization) of the ASCII code, students should familiarize themselves with it and note • how code subgroups are sequenced • the interpretive meanings of the various codes • the breadth of the code applicability to both printing of characters and also communications APPENDIX C of textbook.
Data Typing • In the C language several data types have been specifically designed and planned for within compilers and taking account of modern computer instruction logic (hardware) • Integer : int • Real : float, double • Character : char • These are called the primitive data types. • Supported in hardware by most computers
Data Typing : Integer • Integer variables are defined in declaration statements, as follows: intSymbolName ; /* one variable */ int VarName1, VarName2 ; /* two variable list */ • When the compiler interprets the first statement it • reserves enough room for data to be stored, • translates the user-defined SymbolName into a set of numerical address references that CPU hardware can operate on, and • utilizes the data type assigned (int) to perform semantic consistency checking (and code generation) throughout the program
Data Typing : Integer intSymbolName ; • When the program is eventually compiled and then executed (a.out), a suitable amount of space (L bits, or L/8 bytes) in RAM is allocated to SymbolName • Most computers will allocate 4 bytes (32 bits) • An integer representation is applied (eg. 2’s complement) • Values may be in the range from – 2L-1(minimum, negative) up to 2L-1-1 (maximum, positive) • For a 32-bit integer: 231 is about 2.1 billion
Data Typing : Integer • Integers can come in flavours, or sub-types. short int ShortIntVar ; /* 16b, 32767 */ long int LongIntVar ; /* 64b, 263 ~ 1019 */ unsigned int PosIntVar ; /* ONLY >= 0, 65K */ • Each of these subtypes is useful for solving problems when the range of values is restricted (ie. small, or positive) or when a larger range is needed • Often, specific computers will show differences in performance when operating on integer subtypes
Data Typing : Real Consider the real number (conventional form): 1234.56789 Restate in scientific notation: + 0.123456789 x 104 • Real valued variables are declared as follows float FloatVar ; double DoubleVar ; • Values that are stored in float- and double-sized memory allocations are specified by standards organizations (eg. IEEE, ANSI) • Size • Representation Sign Exponent Mantissa (fraction)
Data Typing : Real • It is obvious that the amount of space that can be allocated to store real values is finite. • For real data, this means that there is a limit to how many significant digits can be stored • Thus, when operating on real data, answers will be adjusted to the available precision offered by each machine • This leads to a potential loss of accuracy in calculations • With potentially devastating effects ! • This subject is typically dealt with in courses (and books) on Numerical Analysis and Applied Mathematics
Data Typing : Real versus Integer • From Mathematics we know that the Set of Integer Numbers is a subset of the Set of Real Numbers • This view is carried out in most programming languages, but with an important caveat: • Semantics (Compilers) • integer valued expressions are subsets of real valued expressions (compatibility) • The converse is not true (incompatibility) • Hardware • Integer and Floating Point calculations are performed by different hardware components which are sensitive to the representational formats of each data type
Data Typing : Character • Character valued variables are declared as follows char CharVar ; • Characters represented using the ASCII encodings are allocated one (1) byte of storage • Exactly and only 1 character per variable • Technically speaking, char is a subtype of int
Data Typing : Character • Later in your study of C, you will encounter the concept of a collection of characters, or strings. • This will involve array and logical delimiter concepts and techniques • An important category of algorithms is that of string processing • Word processing • Language translation, compilers • Natural language processing (NLP) and artificial intelligence (AI)
Data Typing : More ? • As you continue learning the C language you will • Develop an understanding of functions and how they are given a data type attribute • Understand the notion and practice of abstract data types • Understand how to work with arbitrary collections of bits • What the bits represent is only restricted by the limits of your imagination (and some meaningful logic) • You will also need to understand the fundamental logic operations of Boolean Set Theory • and, or, complement, nand, nor, exclusive or, exclusive nor
Data Typing : More ? • A quick note on Input/Output. • Assume the declaration: int N = 5 ; • Consider: printf ( “Total = %d\n”, N ) ; • The %d is used to indicate that an integer (decimal) value is to be outputted. • The value at location N is assumed to be an int data type – if it is not, then a logical error will occur. • The value outputted (5) will be formatted (by default) to start at the position of the % with minus sign (-) if N is negative, followed by as many digits are required.
Data Typing : More ? • A quick note on Input/Output. • Assume the declaration: int N ; • scanf ( “%d”, &N ) ; • The %d is used to indicate that an integer (decimal) value is to be inputted. • The variable N is assumed to be an int data type – if it is not, then a logical error will likely occur somewhere in the program. • The variable N is preceded by the ampersand operator (&) which signifies “address of”. • In other words, we scan the input for a valid integer and store that “at the address of location N”
Data Typing : More ? • A quick note on Input/Output. • In both printf() and scanf() library functions we note that the first operand within parentheses is a string of characters (enclosed within quotation marks “ “) • Within this string are included data specifier codes, each preceded by a % • Integer (int) : %d • Real (float) : %f • Character (char) %c
Symbols and Referencing • User defined variable names (and later functions and data structures) are used to benefit algorithm designers (ie. programmers) • Variables are abstractions of the data values used in actual calculations • We find it easier to refer to X in a formula than to think separately about each specific value that X might represent
Symbols and Referencing • Compilers are programs that follow rigorous rules of logic • Programmers must follow these rules through the formal definitions and requirements of each programming language • In C • All symbols (names) must be declared before they may be referenced • All symbol declarations must follow the C rules of grammar and syntax • Any undeclared symbol references will be reported as compiler errors • Mis-spellings account for most such errors • C language declared symbol names are CaSe sensitive
Values • Data values (called literal values) are stated using conventional formats • Integers: • 0 -1 4789 (no commas) • Reals: • 0 -1 -1.0 3.14159 12345 (no commas) • Characters: (sandwiched between two apostrophes) • `a` `b` `,` `A` `Y` `$` ` \n`
Values • Accuracy is an important consideration when planning solutions • Do not over-specify real values when the machine precision will not allow this (eg. stating Pi with too many digits) • Integers have an upper-limit value (about 2.1 billion) than may be exceeded • Ex. Factorial of 12, 13, 14 ? • Reals may suffer from both an overflow and an underflow that can lead to erroneous calculations
2B : Operator Basics Assignment, Arithmetic, Relations, Expressions, Data types
Lecture 2B: Outline • Operator basics • Assignment • Arithmetic • Relational • Logical • Expressions • Data types
Operator Basics • An operator is a symbol that denotes a specific action. • Operator symbols may be single characters, or they may be terms • Each action must be well-defined (unambiguous) in a mathematical (logical) sense • Actions have both Semantic and Logical aspects • The meaning of the operation (human) • How the operation is performed (computer) • Actions may be understood as sometimes failing • These are noted as exceptions and are usually reportable, or remedial (healing) actions may be prescribed and carried out by computers and O/S`s.
Assignment Operator The way we humans often say this, in English, is: Set N equal to the value 5. In the programming sense, one must be more careful and vigilant to ensure that it is understood that a value is being stored at a memory location. In other words, before the value 5 is actually stored it is not known if N already contains this value. However, once the value has been stored it is clear that the value stored at location N is equivalent (equal to) the value 5. • The set equal to symbol is used to denote the concept of assignment of a value to a variable • This also means that data is being stored in RAM (usually, rarely in the CPU) • Examples: • int N = 0 ; /* declare N and store 0 */ • N = 5 ; /* Store 5 at location N, replace 0 */
Assignment Operator • The assignment operator must be used with care and attention to detail • Avoid using = where you intend to perform a comparison for equivalence (equality) using == • You may use = more than once in a statement • This may be confusing and should be avoided when it is necessary to assure clarity of codes. • Examples: • N = M = 5 ; /* Store 5 at both locations M and N */ • N = ( M == 3 ) ; /* Evaluate if M is equal to 3 - store result at location N */
Assignment Operator • A final point to emphasize • Assignment requires Right-to-Left type compatibility • This means that for every expression: A = B • If the type of A and the type of B are identical then the assignment does not require conversion and is directly implementable • It is necessary that the type of B is a proper subset ( sub-type) of the type of A – thus, if A and B have different types it is necessary to perform conversion of data representation (which may take several primitive operations and be time consuming)
Arithmetic Operators • Arithmetic operators are used to express the logic of numerical operations • This logic may depend on data type • The operators may be grouped as follows: • Addition and Subtraction : + - • Multiplication : * • Integer Division : / % • Floating point Division : / • Auto-Increment and Auto-Decrement • ++ and -- • Pre- versus Post-
Arithmetic Operators : + - * • Addition, subtraction and multiplication of numbers are all meaningful operations • Learned by small children all over the world ! • From a mechanical viewpoint, we all learn to perform these operations in the same way (same algorithms) for both integers and real numbers. • There are some differences to be careful of (more later). • We denote the operator symbols • Addition and Subtraction : + (plus) - (hyphen) • Multiplication : * (asterisk)
Arithmetic Operators : + - * • Unary versus Binary • It is meaningful to say –X (negative X) so C permits use of the minus symbol (hyphen) as a unary operator. It also permits use of + as unary. • Ex. A = -3 ; • Clearly, multiplication (*) of numbers does not make sense as a unary operator, but we will see later that * does indeed act unarily on a specific data type • All operators have typical use as binary operators in arithmetic expression units of the general form • Operand1 arith_op Operand2
Arithmetic Operators : + - * • There are considerable differences between how different computers may handle the int and float (or double) data types • As a general rule, floating point hardware is slower than integer hardware for the same arithmetic operation. • Programmers should work with int ' s unless it is quite clear that float ' s should be used • NOTE: For programs involving financial calculations it is advised to store currency values as integers (low order 2 digits are the cents) and perform integer based computations • Ex. $1,256.73 becomes 125673
Arithmetic Operators : Division A simple illustration of Modulus: Consider the problem of a 12 hour digital clock. The clock starts at time 0, then counts up in 1 hour increments: 1, 2, 3, .... , 10, 11, and then resets to 0 on the twelfth hour. A statement that updates the Hour (assumed of int data type) is : Hour = ( Hour + 1 ) % 12 ; Note how this behaves. When Hour is any value from 0 to 10 inclusive, the right side expression (Hour + 1) evaluates from 1 to 11 and the modulus division does not change this result. However, when Hour is 11, the rhs evaluates to 0. If this statement is in a loop structure, the clock repeatedly counts through the 12 hour cycle. • There are two division operators in C • / (quotient) and % (modulus) • Both are binary operators • Modulus division is used almost exclusively for division of integers, since it evaluates to the remainder • X % Y evaluates to: Q + R / Y • Integer Division : / % • int X=5, Y=3, N, M ; • N = X / Y ; /* evaluates to 1 */ • M = X % Y ; /* evaluates to 2 */ • Floating point Division : / • An expensive operation – use sparingly !
Arithmetic Operators : ++ -- • A common programming statement involves adding (or subtracting) 1 to (from) a variable used for counting • N = N + 1 ; N = N – 1 ; • The addition of 1 to an integer variable is called incrementation • Similarly, subtracting 1 from an integer variable is called decrementation
Arithmetic Operators : ++ -- • The C language supports two operators that automatically generate increment or decrement statements on integer variables • Auto-Increment ++ • Auto-Decrement -- • Examples: (Equivalent statements) ExplicitPost-autoPre-auto • N = N + 1 ; N++ ; ++N ; • N = N – 1 ; N-- ; --N ;
Arithmetic Operators : ++ -- • There is a very important difference between using these operators before versus after a variable symbol • AFTER (POST) : • If an expression contains N++, the expression is evaluated using the value stored at the location N. After the expression is evaluated, the value at N is incremented by 1. • BEFORE (PRE) : • If an expression contains ++N, the value at N is incremented by 1 and stored at N, before any other parts of the expression are evaluated. The expression is then evaluated using the new value at N.
Arithmetic Operators : ++ -- • Assume the declarations with initial values specified • int A, B, N = 4, M = 3 ; • What are the final values of A, B, N and M ? • A = N++ ; • B = ++M + N-- ; /* watch out ! */ • A = --A ; • ANSWER: A = 3 B = 9 N = 4 M = 4
Augmented Assignment Operators • Operator augmentation involves combining two operator symbols to form a new symbol with extended meaning • Arithmetic Assignment operators combine the expressiveness of arithmetic and assignment and permit abbreviation of coding • += and -= • *= • /= and %= • In some cases they may lead to hardware optimization of executable code.
Augmented Assignment Operators • Although these operations have a certain kind of elegance, they may create ambiguity. • However, programmers should ensure that programs have clarity. • Examples: • LonghandShorthand • X = X + Y ; X += Y ; • X = X * Y ; X *= Y ; • X = X % Y ; X %= Y ;
Relational Operators • Relational operators are used to express the concept of comparison of two values • Based on the Boolean notions of True and False • This is vital to decision making logic where we do something – or not – based on evaluating an expression • while ( Age > 0 ) ..... • if ( Num <= 0 ) .....