300 likes | 424 Views
Basic data types and their representation. CS101 2012.1. Announcements. If biometric ID does not work, write your roll number and sign your name on a piece of paper All lab batches should be stable now Lab this week will continue with familiarization and small programs
E N D
Basic data types and their representation CS101 2012.1
Announcements • If biometric ID does not work, write your roll number and sign your name on a piece of paper • All lab batches should be stable now • Lab this week will continue with familiarization and small programs • No tutorials this week either • But we will begin posting homework on Moodle • Will give you an impression of exam questions • Will not be graded • Will be discussed in tutorials CS101 2012.1
Layers of abstraction • Three layers • Implement fixed size primitive types by mapping possible/supported values to bit patterns • Add collection types on top of primitive types to assist writing complicated programs • Collections usually change sizes and memory layout during program execution Collection types: arrays, matrices, lists, maps, strings Primitive data types: character, integer, float, double Memory as arrayof bytes CS101 2012.1
Memory, values, variables • Unit of storage: bit (0/1) • Because such computers are easier to implement by switching transistors off and on • A byte is 8 bits wide • Values range from 00000000 to 11111111 • 28 = 256 possible bit configurations • Can be interpreted as integers from 0 to 255 (“unsigned char”) • Electronic and magnetic memory is allocated in units of bytes CS101 2012.1
Binary arithmetic • Byte value in binary: 00000000 (8 bits) • Corresponding decimal value = 0 • Written as 0dec to avoid confusion • In decimal, to increment a number, increment the unit position, unless there is overflow, in which case carry over… etc. • Same in binary • Next few values are 00000001 (=1dec), 00000010 (=2dec), 00000011 (=3dec), 00000100 (=4dec), 00000101 (=5dec) etc. CS101 2012.1
Character (char) • To a first approximation, a character is the same as a 8-bit byte • (More recently, multi-byte characters have been designed to support all the world’s languages) • The key difference is in how the byte is interpreted and processed (e.g., printed) • E.g., 97 means ‘a’, 98=‘b’, 65=‘A’, 66=‘B’ etc. • C++ lets you compare characters using the corresponding integer • Useful for sorting strings in dictionary order CS101 2012.1
Hexadecimal notation (hex) • Byte (8 bits) consists of two “nibbles” (4 bits) • Nibble ranges between 0 and 15 • Expressed in hexadecimal, 0 to 9, a to f • a=10, b=11, c=12, d=13, e=14, f=15 • So a byte is written as two hexadecimal digits, e.g. 0a or c5 • Note that 23 hex is not 23 decimal! • To make clear, written as 0x23 • printf demo CS101 2012.1
Fixed size integer types • “Short integers” (short) are 16 bits wide • 65536 possible values • Standard integers (int) are 32 bits wide • 4,294,967,296 possible values • Adequate for most purposes except governments bailing out banks and airlines • A long long int is 64 bits wide • Will sometimes call long for brevity (as in Java) • Real numbers are represented using float and double (“double precision”) … later CS101 2012.1
Two’s complement representation • Want to represent both positive and negative integers with a bit sequence (say 4 bits) • Trivial: use one bit for sign • Waste one configuration (plus and minus zero) • 0000 (0) through 0111 (7) are positive • 8 more values, so assign to 8 through 1 CS101 2012.1
The wrap-around Min=10…0 -1=1…1 0=0…0 Max=01…1 Zero is one position to the right of center CS101 2012.1
Two’s complement, continued • One sudden “wrap-around” from 7 to 8 • Works exactly the same for short, int and long int, with corresponding wrap, max, min values • Most programming systems will not detect if the wrap happens • If your program uses values near the edges, be careful in doing arithmetic and check the result! • Library packages exist to support arbitrarily large integers, not as efficient as fixed length CS101 2012.1
Real number representations • “Floating (decimal) point” • In decimal we write 0.3141011 • 0.314 is the mantissa, 11 is the exponent • Mantissa has decimal point at beginning • Same approach in computers, with radix 2 instead of 10 • In a float • 1 sign, 8 exponent, 23 mantissa bits • In a double • 1 sign, 11 exponent, 52 mantissa bits CS101 2012.1
Floating point numbers • Finite bits cannot represent all real values • Gaps between numbers that can be represented • Need care in writing expressions that combine values to avoid errors, minimize loss of precision CS101 2012.1
Some finite precision pitfalls • Some 32- and 64-bit patterns have been set aside to represent • Positive and negative infinity • Not a number or NaN (e.g. result of 0/0) • Most systems will detect overflow but not underflow • float a = 3.3e38 / 0.01;correctly results in a being “inf” • But 3.3e38 + 5 silently equals 3.3e38 (not enough bits in mantissa) CS101 2012.1
Operations on numeric types • All integers support +, , *, /, % (remainder) • Even characters support + and • E.g., ‘a’ + 1 = ‘b’; what is ‘Z’+1? (Try it) • Float and double support +, , *, / • More complicated operations like log, exp, sine, etc. are implemented as functions • You can compare numbers using comparison operators <, <=, ==, >=, != • The result is a Boolean (0/1) value (next) • cout << (5 > 7); • cout << (4 != 3); CS101 2012.1
Boolean values and operations • In C++, int can be reused as Boolean (0 = false, anything else is true) • Binary operator && (and) • Binary operator || (or) • Short-circuit evaluation CS101 2012.1
Not and ex(clusive) or • Unary operator ! (not) • Binary operator exor is not available on single Booleans but instead on bit vectors (next) CS101 2012.1
The bool type • Old C++ used int to store Boolean values • But ANSI standard C++ does offer a type called bool • bool tval = true, fval = false; • int ival = int(tval); • However, old bad habits still allowed • if (37) { … } • bool bval = 37; • Overall value unclear CS101 2012.1
Bit array manipulation • Fixed size integers are arrays of bits • C++ lets you do bitwise Boolean algebra • a & b (and), a | b (or), a^b (exor), ~b (not) 10110110 10110110 10010101 10010101 & ^ 10010100 00100011 10110110 00100011 10010101 ~ 11011100 | 10110111 CS101 2012.1
Bit shift operations • int c = 5; cout << (c << 2); • Bits lost from the left (msb) • Zero bits inserted from the right (lsb) • Result is 20 (= 5 22) • Cheap way to multiply by powers of two 00000000,00000000,00000000,00000101 00000000,00000000,00000000,00010100 CS101 2012.1
Right shift • c >> 2 • Bits discarded to the right (lsb) • If msb of c was 0, then 0 bits injected from left (msb) • 5 >> 2 gives 1 • If msb of c was 1 (c was negative) then 1 bits injected from left • -5 >> 2 gives -2 (work it out) • 0xfffffffb >> 2 gives 0xfffffffe • Preserves sign of number CS101 2012.1
Some applications of bit operations • Is an int x odd or even? • int isOdd = (x & 1); • Remainder when divided by 8 • int remain = (x & 7); • Faster than x % 8 • How many one bits in a 32-bit int? • Repeat 32 times: • numOnes = numOnes + (x & 0x8000000); • x = x << 1; In binary this looks like a one followed by 31 zeros CS101 2012.1
Primitive variable declaration and literals • float fahrenheit; • Uninitialized, may get garbage on read • float fahrenheit = 95; • const float fahrenheit = 9.52e14; • Value will never change • Scientific notation saves typing lots of zeros • int x = 3, y = x/2; • Can initialize variables based on others already initialized CS101 2012.1
Why bother to declare • Variable names • What if you type it incorrectly later? • To initialize before any use • Types • To check all assignments to the variable • To interpret a bit sequence as intended in your program (e.g. float and int are both 32 bits) • There are languages that do not enforce variable name and type declarations • Can be lazy, but generally a Bad Idea CS101 2012.1
Type conversions • Some conversions are implicit • short x = 20000; int y = x; • int x = 40000; short y = x; • Others may result in overflow • double x = 5e40; float y = 2*x; • Some are errors • float x = (float) “hello world”; • Implicit typing • float x = 7/3; • float x = 7/3.; CS101 2012.1
Polymorphic operators and literals • 7/3 vs 7/3. • / represents division for int, float, double • Which one is invoked depends on the (inferred) type of arguments toFloat floatDiv intToFloat intDiv toInt toInt toInt toFloat `7’ ‘3’ `7’ ‘3.’ CS101 2012.1
The string data type • When we saidcout << “Hello world\n”“Hello world\n” was stored as an array of characters • Byte corresponding to H, e, …, \n, and finally a “null byte” or 00000000 (in binary) to mark the end of the string • A more modern and better way is to use the string data type • string message(“Hello world”); CS101 2012.1
Common string operations • Get the number of characters in the string • message.size() • Get the character at a specific position • message.at(5) or message[5] • Get a substring of the given string • message.substr(1, 3) • Index out of bound? • Some operations throw exceptions • Some silently truncate • Some may return garbage Calling a method on a string object CS101 2012.1
More string operations • Find the first (leftmost) or last (rightmost) occurrence of a character • message.find_first_of(‘o’) • message.find_last_of(‘e’) • Compare two strings (dictionary or lexicographic order) • msg1.compare(msg2) • Returns an integer • Negative if msg1 should appear before msg2 • Zero if msg1 and msg2 are equal • Positive if msg1 should appear after msg2 CS101 2012.1