7.21k likes | 7.22k Views
Gain deeper insight into C++ nuances, language standards, undefined behavior, compile-time errors, preprocessor complexities, and implementation strategies. Understand why C++ is designed the way it is and how to manage potential pitfalls effectively. Explore the compilation process and runtime intricacies in this advanced course.
E N D
Accelerated C/C++ Advanced OOPCS 440/540Fall 2019Kenneth Chiu
Purpose • I’m assuming all of you have had a first-pass in C++. • Basic style • Basic programming approach • Learn iteratively. • Content-based on what has tripped me up in the past. • Why is C++ the way it is? • Sometimes a language may have a mistake. • What is a mistake in a language?
Compilation Process • What happens when we compile? • What is object code? • Invoked via a front-end driver: • Run the preprocessor. • Run the compiler. • Run the assembler. • Run the linker. • [Show] • What happens when you run a compiled program? • OS loads the file into memory. • Sets up various memory regions. • May do some runtime linking/relocation. • Jump to main. • [Show]
Language Runtime • Close correspondence between machine and source. • i = 2 + j; • Not a close correspondence. • if (typeid(A) == typeid(B)) { … } • Such machine code is said to be part of the language runtime. • Source code that causes no machine code to be generated?
Types of Violations of the Standard • What is a language standard, what does it specify? • Does it tell you what you can or cannot do? • Not really. • It specifies constraints on the language. It says, if you do A, then B happens. • It’s a contract, essentially. • What is a “violation” of the standard? • It’s imprecise word, but commonly used to mean something not in accordance with some aspect of the standard. • What happens when your program violates the standard? • Does it mean the program won’t compile? Will it crash? • Can a compiler also violate the standard? • Are there different ways that a program can “violate” the standard?
What will this print out? • #include <iostream>int main() { char i = 0;i--; std::cout << (int) i << std::endl;} • Prints out -1 when compiled on remote.cs.binghamton.edu. • Is it guaranteed to print this out? In every occurrence? • How can we find out what it will print out? • The compiler vendor needs to tell you if char is signed or unsigned? • Known as implementation-defined behavior. • What will this print out? • void foo(int i1, int i2) { }foo(printf(“First.\n”), printf(“Second.\n”)); • Is it guaranteed to print this out? In every occurrence? • How can we find out what it will print out? • You can’t. The compiler vendor can change it at will. • Known as unspecified behavior. • What will this do? • int *a = 0;*a = 1; • Anything could happen. Anything. • Known as undefined behavior.
Implications of “undefined behavior” for the Implementer • As the implementer, what do you do if the specification says that the result is undefined behavior? • For example, let’s say the function below is specified as: • void foo(int *); • If the pointer is null, the behavior is undefined. • If the pointer does not point to the first element of an array of two integers, then the behavior is undefined. • Implementer is allowed to pretend it never happens. • So do all languages have UB? Why allow it? • [Show undefined_behavior.]
Compile-Time vs. Run-Time • What kind of error is better? • All else being equal, there’s no advantage to making it run-time. • [Show run_time_vs_compile_time.] • However, to turn that run-time error into a compile-time error usually means using a strict, statically-typed language. • Those languages are usually thought of as being not as flexible.
Preprocessor • #include “A.hpp” • #include <A.hpp> • #define macro(a, b) • #if • #if FOO == 3 • #if defined(a) • #endif • #else, #elif • #ifdef, #ifndef • #error, #pragma • __LINE__ (integer literal), __FILE__ (string literal), __DATE__ (string literal), __TIME__ (string literal) • __cplusplus • #(stringification)and ## (token gluing) in expansions • [Show]
-E (option to show results after preprocessing) • [Show preprocessor_output_option.] • Line continuation is a backslash at the end. (Make sure you don’t have a space after the backslash.) • What’s wrong with this? • #define MY_MACRO(t) \ void foo_##t() { \ // This is a comment. \call_something(); \ }
Multi-line macros: • #define macro(x) \cout << x << endl; \cout << “DEBUGGING” << endl; • Good? • if (condition) macro(my_var); • #define macro(x) \ do { \cout << x << endl; \ /* Etc. */ \ } while (0) • Why did we leave off the semicolon at the end?
Macros can call other macros, but cannot be used recursively. • #define apply(m, x) m(x)#define m1(x) x + x#define m2(x) x*x • apply(m1, i) i + i • apply(m2, i) i*i
Suppose you have code that should compile on both Linux and Win32, but you need to call a different function in each. What do you do? • Why not just have two versions of your source code? • Conditional compilation: • #ifdef __linuxsome_linux_func(…);#else some_win32_func(…);#endif • Is this good? • #ifdef __linuxsome_linux_func(…);#elif defined (__win32_preprocessor_symbol) some_win32_func(…);#else #error Unknown platform#endif • To show all predefined macro symbols, use –E –dM.
Predefined identifiers: These are not macros • __func__: A static const character array initialized with the current function name. • void some_func() {cout << __func__ << endl;}void some_other_func() {cout << __func__ << endl;} • Why aren’t these macros?
Variadic macros • #define foo(…) func(_VA_ARGS_) • Let’s say we wanted to print the file name and line number with all of our debug statements. • [Show debug_print.] • #include <stdio.h>#define dbg(fmt, ...) \fprintf(stderr, "%s[%d]: " fmt "\n", __FILE__, __LINE__, ## __VA_ARGS__)intmain() {dbg("%d, %d", 1, 2);dbg("%d, %d, %f", 1, 2, 3.14);dbg("Uh-oh!");}
Assertions • Assertions are ways of “asserting” that certain things are true. • By putting in an assertion, you are saying that if the expression being asserted is not true, then something is very seriously wrong. • What kind of assertions might you use here? • delet(Animal *a) { …} • Animal *pop_front(List *l) { …} • To make assertions work, include this header file: • #include <assert.h>
Assert liberally. • Preconditions • Any condition that before a section of code that that code relies on for correctness. • Postconditions • Any condition after a section of code that that code will preserve, if it was correct. • For example, in a BST, you know that the left child must be greater than or equal to the right child. • Loop invariants • Any condition within a loop that your code relies on for correctness. • Consider code for binary search. Let’s say that it maintains an index to the beginning of the current search region (begin_index), and a pointer to the end of the current search region (end_index). What assertion can you put in at the end of the loop? • while (...) { ... assert(???);}
Use to check return codes. • A quick and dirty way of checking error codes. Gives you 80% of the benefit, with 5% of the effort. • if ((rv = some_lib_or_sys_call(…)) < 0) {fprintf(stderr, “…”); abort();} • rv = some_lib_or_sys_call(…);assert(rv == 0); • To compile-out the assertions, define NDEBUG. • g++ -DNDEBUG foo.cpp • I strongly urge that you check the return value from every single library or system call. (Except printf().)
How is the assert() macro defined? • How do we find it? • #ifndef NDEBUG #define assert(e) \ ((e) ? static_cast<void>(0) \ : fail(#e, __FILE__, __LINE__, __func__))#elif #define assert(e)#endif • What if we have an assertion that is low-cost, so we always want it to be included, even in production code? • #define check(e) \ ((e) ? static_cast<void>(0) \ : fail(#e, __FILE__, __LINE__, __func__))
Assertions vs. Exceptions vs. Special Return Values • What are the possible behaviors you could implement for these conditions? • Animal *find(const char *type_name); • Normally returns the found Animal. • What should you do if the animal is not found? • int read(int fd, char *buf, int count); • Normally returns count of bytes read. • What should you do if it is the end of file? • double sqrt(double x); • Normally returns the square root of the number. • What should you do if the number is negative? • int send_socket(int sock_fd, const char *buf, int count); • Normally returns the count of the number of bytes sent. • What if the network is down? • void *lookup_in_hash_table(const char *key); • Normally returns the value that is found (which is a void * for this particular hash table). • What if the key is not found? • void *malloc(size_tsz); • What if there is no more memory?
Compile-Time Assertions • assert() is strictly run-time. You won’t know till you run the program. • How can you assert things at compile-time? • A limited number of things can be checked in the preprocessor: • #define FOO 2#if FOO < 4 …#endif • #if sizeof(long) < 8 #error Type long is too small.#endif • C++11 supports static_assert, which can check basically anything that is a compile-time constant (constexpr). • static_assert(sizeof(long) >= 8, “Type long is too small.”); • (Message is optional in C++17.) • C++98 can use boost static asserts.
Comments • How are these comments? • // Define an int variable i in the for-loop, and// initialize it to 0. Execute the for-loop as long// as i is less than the value of a.size(). At the// end of each iteration, increment i by 1. In the// body of the for-loop, multiple a[i] by 2, and// assign it to a[i]. • for (int i = 0; i < a.size(); i++) { a[i] *= 2;} • // Add 7 to x, then bitwise AND it with the bitwise// complement of 0x7. • x = (x + 7) & ~0x7; • // Call compute_y() with i and x as parameters. • compute_y(i, x); • // Initialize health to 1. • double health = 1.0;
Better? • // Double each element in array a.for (int i = 0; i < a.size(); i++) { a[i] *= 2;} • // Round up x to next multiple of 8.x = (x + 7) & ~0x7; • // Compute y-coordinate of ith// player given a fixed// x-coordinate.compute_y(i, x); • // Create monsters starting at// health 1.0, since health 0 means// dead.double health = 1.0;
Don’t say the obvious. • // Initialize x.x = 1; • Do comment the non-obvious. • // Round up x to next multiple of 8x = (x + 7) & ~0x7; • If working in a team, consider leaving your initials/name in comments that might need explanation. • // x cannot be defined as a long due to a bug// in the Solaris compiler. –ken
Let’s say you have 1000 lines of code that you want to comment out for some reason: • for (inti = 0; i < j; i++) {// 1000 more lines.// ... • How do you do it? • /*/* This is a comment. */*/ • Use #if 0 to comment out large sections. It will nest. • // Works.#if 0#if 0// ...#endif#endif
C++/C Source Code Organization • Why break code up into multiple files? • Ease of finding things? • Compilation speed. • Only need to recompile part of the app. • Known as separate compilation • Libraries • Reuse? • If I put a class separately into A.cpp, it is easier to move to another application.
Okay, why split into header file and implementation file? What (bad) things would happen if we did not? • For libraries, the case is clear. • Need declarations to tell the compiler how to call code. • What about your application? Why not put everything into A.cpp, like in Java? • Suppose B.cpp needs to use class A. • The compiler needs the declarations. • Why not just include the whole source file? • Why need header file if already linking the library? • Because the link happens after the assembly code for a call is generated. • Even if the compiler knew the libraries early, it can’t find the calling information. • Why need library if already have the header file? • The header file tells the compiler how to call something, but there still has to be some code there to call. • Are you satisfied with these answers? What’s the meta-question here? (Why isn’t this an issue in Java?)
The header-file/implementation-file split is a convention. • The standard does not dictate what goes in a header file. • However, the design of C++ does strongly influence best practices. • What goes in a header file? • The minimum amount necessary for the implementation (classes and/or declarations of standalone functions) to be used by other files. • In other words, you divide the code into two chunks. • In the first chunk, you put everything that is needed to use your code. • Such as call it, or if a class, define an instance of the object, etc. • This is the header file (.hpp). • Everything else goes in the second chunk. • This is the implementation file (.cpp) file.
A.hpp B.hpp C.hpp includes A.cpp B.cpp C.cpp main.cpp compiled to A.o B.o C.o main.o link a.out(exe) Libraries
How many times per executable is a header file compiled? What about implementation file? • If something can go in either, should we put it in the header file or implementation file? • What do these code snippets need? • obj->a_member • void foo(MyClass *) {} • obj->foo(); • Where do these go? • Class definitions • Function definitions • Function declarations • Global variable declarations • Global variable definitions
Translation Unit • Consider this code fragment from a file named foo.cpp: • …void foo() {goo();}… • Are either of these statements is clear and unambiguous? • “The call to goo() will be a syntax error if there is no declaration of it in this file.” • “The call to goo() will be a syntax error if this file doesn’t declare goo(), and this file doesn’t (recursively) include any header files that declare goo().” • This suggests that we should have a new term: A translation unit is the result of reading in a file, after all processing of included files and conditional compilation. • “This will be a syntax error if there is no declaration of goo() in the translation unit.”
Handling Global Variables • How do you use global variables when you split things into files? Does this work? File1.cpp int a;void f() {// Access a.// …} File2.cpp int a;void g() {// Access a.// …} $ g++ File1.cpp File2.cpp ...
One Definition Rule (ODR) • In C and C++, each variable can be defined only once. You can declare a global variable by using extern. • Defining (no extern) actually creates a variable. • Declaring (by using extern) states the existence of a variable, and indicates that it was defined somewhere else, so tells the linker to go look for it. • So, a global variable should be defined in one translation unit and declared in all others that use it. • How to fix previous?
File2.cpp extern int a;void g() {// Access a.// …} File1.cpp int a;void f() {// Access a.// …} • You need to have: • Of course, you should probably be more systematic about it: globals.hpp extern int a;extern double x; globals.cpp int a;double x; File1.cpp #include “globals.hpp”void f() {// Access a.// …} File2.cpp #include “globals.hpp”void g() {// Access a.// …} $ g++ globals.cpp File1.cpp File2.cpp ...
There is a lot of redundancy between globals.hpp and globals.cpp. Imagine if it were a very large file. Anyway to avoid it? globals.cpp int a;double x;// ...// Zillions of them globals.hpp extern int a;extern double x;// ...// Zillions of them
globals.hpp#ifndef XYZZY_GLOBALS_HPP#define XYZZY_GLOBALS_HPP#include <A.hpp>#ifndef XYZZY_GLOBALS_EXTERN#define XYZZY_GLOBALS_EXTERN extern#endifXYZZY_GLOBALS_EXTERN A a;XYZZY_GLOBALS_EXTERN double x;#endif File2.cpp#include “globals.hpp”void g() { // Access a. // …} • Leverage the preprocessor: globals.cpp#define XYZZY_GLOBALS_EXTERN#include “globals.hpp” $ gcc File1.cpp File2.cpp globals.cpp • Isn’t this actually more complicated?
ODR, Revisited • Let’s say you are the linker implementer. Could you make this work if you wanted to? • [Show multiple_definitions_2] File1.cpp int a;void f() {// Access a.// …} File2.cpp int a;void g() {// Access a.// …} $ g++ File1.cpp File2.cpp ...
File1.cpp int a;void f() { // Access a. // …} File2.cpp int a = 1;void g() { // Access a. // …} • What about this? • We could make this work, but which one? • At some point, rules become too complicated. Sometimes simple rules are better, even if they sometimes seem to make things inconvenient. $ g++ File1.cpp File2.cpp ...
Include Guards • The (loose) convention in C++ is to put each class in a separate header file. • Is this correct? D1.hpp #include "B.hpp"class D1 : public B { … }; main.cpp #include "D1.hpp"#include "D2.hpp"int main() { D1 d1; D2 d2; // …} B.hpp class B { … }; D2.hpp #include "B.hpp"class D2 : public B { … };
Include guards make includes “idempotent”. (This means it’s okay if a file gets included twice.) • Maintains simple rule: If you use a class, include its header file. B.hpp #ifndef XYZZY_B_HPP#define XYZZY_B_HPPclass B { … };#endif D2.hpp #ifndef XYZZY_D2_HPP#define XYZZY_D2_HPP#include "B.hpp"class D2 : public B { … };#endif D1.hpp #ifndef XYZZY_D1_HPP#define XYZZY_D1_HPP#include "B.hpp"class D1 : public B { … };#endif main.cpp #include "D1.hpp"#include "D2.hpp"int main() { D1 d1; D2 d2; // …} Why the funny prefix?
Does this work? • // A.hpp#ifndef ACME_A_HPP#define ACME_A_HPP#include “B.hpp”struct A { B *b_field;};#endif • // B.hpp#ifndef ACME_B_HPP#define ACME_B_HPP#include “A.hpp”struct B { A a_field;};#endif
First-level of include: • // A.hpp#ifndef ACME_A_HPP#define ACME_A_HPP// B.hpp#ifndef ACME_B_HPP#define ACME_B_HPP#include “A.hpp”struct B { A a_field;};#endifstruct A { B *b_field;};#endif Include of B.hpp from top-level A.hpp
Second-level of include • // A.hpp#ifndef ACME_A_HPP#define ACME_A_HPP// B.hpp#ifndef ACME_B_HPP#define ACME_B_HPP// A.hpp#ifndef ACME_A_HPP#define ACME_A_HPP#include “B.hpp”struct A { B *b_field;};#endifstruct B { A a_field;};#endifstruct A { B *b_field;};#endif Include of A.hpp in include of B.hpp in top-level A.hpp Include of B.hpp in top-level A.hpp
Second-level of include • // A.hpp#ifndef ACME_A_HPP#define ACME_A_HPP// B.hpp#ifndef ACME_B_HPP#define ACME_B_HPP// A.hpp#ifndef ACME_A_HPP#define ACME_A_HPP#ifndef ACME_B_HPP// Nothing here, all skipped#endifstruct A { B *b_field;};#endifstruct B { A a_field;};#endifstruct A { B *b_field;};#endif Include of A.hpp in include of B.hpp in top-level A.hpp Include of B.hpp in top-level A.hpp
Solution is a forward declaration to break the cycle. • // A.hpp#ifndef ACME_A_HPP#define ACME_A_HPPstruct B;struct A { B *b_field;};#endif • // B.hpp#ifndef ACME_B_HPP#define ACME_B_HPP#include “A.hpp”struct B { A a_field;};#endif
How about this? • // A.hpp#ifndef A_HPP#define A_HPPstruct B;struct A { B foo() { return B(); }};#endif • // B.hpp#ifndef B_HPP#define B_HPPstruct A;struct B { A foo() { return A(); }};#endif Show recursive inline
Need to split apart class definition from function definition: • // A.hpp#ifndef MY_COMPONENT_HPP#define MY_COMPONENT_HPPstruct B;struct A { inline B foo();};struct B { inline A foo();};inline B A::foo() { return B(); }inline A B::foo() { return A(); }#endif Show recursive inline
Can accomplish same effect by careful positioning. • // A.hpp#ifndef A_HPP#define A_HPPstruct B;struct A { inline B foo();};#include “B.hpp”inline B A::foo() { return B(); } #endif • // B.hpp#ifndef B_HPP#define B_HPPstruct A;struct B { inline A foo();};#include “A.hpp”inline A B::foo() { return A(); }#endif Show recursive inline
Header files should be independent. • // Should not need anything here.#include <A.hpp> • A header file should always be included in the implementation file. Which is better? • // File A.cpp#include <iostream>#include <A.hpp>// Code is here… • // File A.cpp#include <A.hpp>#include <iostream>// Code is here…