420 likes | 591 Views
2. or, What I wish I had known about C during my first summer internship . With extra info in the NOTES. 3. High Level Question: Why is Software Hard?. Answer(s):Complexity: Every conditional (?if") doubles number of paths through your code, every bit of state doubles possible statesSolution: reu
E N D
1. 1 A Quick Introduction to C Programming Lewis Girod
CENS Systems Lab
July 5, 2005
http://lecs.cs.ucla.edu/~girod/talks/c-tutorial.ppt
2. 2 or,What I wish I had known about C during my first summer internship In my first summer internship my task was to convert a numeric program from FORTRAN to C, knowing nothing about either language – I only knew Pascal and BASIC. The result was that I sort of learned C, but in a sort of voodoo way, never really being sure what was going on, but getting some things to work, at least some of the time. Since then I’ve gradually worked out how things really work, and really make sense, and become proficient…. But this was a slow process. Today I hope to help shortcut some of that process.
How this is going to proceed. I have limited time here so I will go kind of fast. I may mention some things in passing that I won’t fully explain. However, afterwards you will have the .ppt file and the notes. All the stuff I mention in passing is in the slides and the notes. This way, even if you don’t “get” all the details the first time, you can go back and refer to the slides and notes.In my first summer internship my task was to convert a numeric program from FORTRAN to C, knowing nothing about either language – I only knew Pascal and BASIC. The result was that I sort of learned C, but in a sort of voodoo way, never really being sure what was going on, but getting some things to work, at least some of the time. Since then I’ve gradually worked out how things really work, and really make sense, and become proficient…. But this was a slow process. Today I hope to help shortcut some of that process.
How this is going to proceed. I have limited time here so I will go kind of fast. I may mention some things in passing that I won’t fully explain. However, afterwards you will have the .ppt file and the notes. All the stuff I mention in passing is in the slides and the notes. This way, even if you don’t “get” all the details the first time, you can go back and refer to the slides and notes.
3. 3 High Level Question: Why is Software Hard? Answer(s):
Complexity: Every conditional (“if”) doubles number of paths through your code, every bit of state doubles possible states
Solution: reuse code with functions, avoid duplicate state variables
Mutability: Software is easy to change.. Great for rapid fixes ?.. And rapid breakage ?.. always one character away from a bug
Solution: tidy, readable code, easy to understand by inspection.
Avoid code duplication; physically the same ? logically the same
Flexibility: Programming problems can be solved in many different ways. Few hard constraints ? plenty of “rope”.
Solution: discipline and idioms; don’t use all the rope
4. 4 Writing and Running Programs Gcc compiler options:
-Wall tells the compiler to generate ALL “warnings”. These warnings will often identify stupid mistakes.
-g tells the compiler to generate debugging information
If you don’t supply a –o option to set an output filename, it will create an executable called “a.out”
./?
To run a program in the current directory use ./program ‘.’ is the current directory. Otherwise the shell will only run program that are in your “PATH” variable, which is usually “standard” paths like /bin and /usr/bin
What if it doesn’t work?
If your program compiles but doesn’t work, there are many debugging tools at your disposal.
“strace” traces all system calls made by your program. This will tell you when you call into the OS, e.g. open(), read(), write(), etc.
“ltrace” traces all standard library calls made by your program.
“gdb” is a source-level debugger. With this tool you can analyse core dumps, and step through your program line by line as it runs.
“valgrind” is an x86 virtual machine that is useful for catching memory errors such as use of freed memory, bad pointers, etc.
Another common way to trace a program is “printf debugging”, where you put printf statements throughout your code to see what’s going on. Gcc compiler options:
-Wall tells the compiler to generate ALL “warnings”. These warnings will often identify stupid mistakes.
-g tells the compiler to generate debugging information
If you don’t supply a –o option to set an output filename, it will create an executable called “a.out”
./?
To run a program in the current directory use ./program ‘.’ is the current directory. Otherwise the shell will only run program that are in your “PATH” variable, which is usually “standard” paths like /bin and /usr/bin
What if it doesn’t work?
If your program compiles but doesn’t work, there are many debugging tools at your disposal.
“strace” traces all system calls made by your program. This will tell you when you call into the OS, e.g. open(), read(), write(), etc.
“ltrace” traces all standard library calls made by your program.
“gdb” is a source-level debugger. With this tool you can analyse core dumps, and step through your program line by line as it runs.
“valgrind” is an x86 virtual machine that is useful for catching memory errors such as use of freed memory, bad pointers, etc.
Another common way to trace a program is “printf debugging”, where you put printf statements throughout your code to see what’s going on.
5. 5 C Syntax and Hello World What do the <> mean?
Include directives use <x.h> (brackets) to indicate that the compiler should look in a “standard” place, such as /usr/include/…
They use “x.h” (double quotes) to indicate that it should look first in the current directory.
Can your program have more than one .c file?
A ‘.c’ file is called a “module”. Many programs are composed of several ‘.c’ files and libraries that are ‘linked’ together during the compile process.What do the <> mean?
Include directives use <x.h> (brackets) to indicate that the compiler should look in a “standard” place, such as /usr/include/…
They use “x.h” (double quotes) to indicate that it should look first in the current directory.
Can your program have more than one .c file?
A ‘.c’ file is called a “module”. Many programs are composed of several ‘.c’ files and libraries that are ‘linked’ together during the compile process.
6. 6 A Quick Digression About the Compiler Why preprocess? Having two phases to a process with different properties often helps to make a very flexible yet robust system. The underlying compiler is very simple and its behavior is easily understood. But because of that simplicity it can be hard to use. The preprocessor lets you “customize” the way your code “looks and feels” and avoid redunancy using a few simple facilities, without increasing the complexity of the underlying compiler. Macros can be used to make your code look clean and easy to understand. They can also be used for “evil”.
What are continued lines? Continued lines are lines that need to be very long. When the preprocessor encounters a ‘\’ (that is not inside a string), it will ignore the next character. If that character is a newline, then this “eats” the newline and joins the two lines into one. This is often used when defining macros with #define, because a macro must be all on one line. Be careful! If you have a space after a ‘\’, it will eat the space and will NOT join the lines.
Why preprocess? Having two phases to a process with different properties often helps to make a very flexible yet robust system. The underlying compiler is very simple and its behavior is easily understood. But because of that simplicity it can be hard to use. The preprocessor lets you “customize” the way your code “looks and feels” and avoid redunancy using a few simple facilities, without increasing the complexity of the underlying compiler. Macros can be used to make your code look clean and easy to understand. They can also be used for “evil”.
What are continued lines? Continued lines are lines that need to be very long. When the preprocessor encounters a ‘\’ (that is not inside a string), it will ignore the next character. If that character is a newline, then this “eats” the newline and joins the two lines into one. This is often used when defining macros with #define, because a macro must be all on one line. Be careful! If you have a space after a ‘\’, it will eat the space and will NOT join the lines.
7. 7 OK, We’re Back.. What is a Function? Include directives use <x.h> (brackets) to indicate that the compiler should look in a “standard” place, such as /usr/include/…
They use “x.h” (double quotes) to indicate that it should look first in the current directory.
Include directives use <x.h> (brackets) to indicate that the compiler should look in a “standard” place, such as /usr/include/…
They use “x.h” (double quotes) to indicate that it should look first in the current directory.
8. 8 What is “Memory”? What’s 72? ASCII is the coding scheme that maps bytes to characters. Type ‘man ascii’ at the shell to get a listing of the mapping.
Not always… Types such as int vary in size depending on the architecture. On a 32 bit or 64 bit platform, ints are 4 bytes (32 bits). On an 8 or 16 bit platform, an int is 2 bytes (16 bits). To be safe, it’s best to specify types exactly e.g. int32_t, int16_t., int8_t. Defined in #include <inttypes.h>
Signed? Signedness is a common source of problems in C. It’s always clearest to specify signness explicitly (and size) using the [u]int[8,16,32]_t types. Signedness can introduce surprises when casting types because of “sign extension”. For example, char is signed (although it’s normally used as unsigned 0-255). If you cast char to int, it will be a signed integer and the sign will be extended. If you then cast that to unsigned, it turns a small negative number into an enormous int value (high bit set). That is, (uint32_t)(int)(char)(128) == 4294967168 (not 128!)
One easy solution to this is to always use uint8_t rather than char for byte streams.What’s 72? ASCII is the coding scheme that maps bytes to characters. Type ‘man ascii’ at the shell to get a listing of the mapping.
Not always… Types such as int vary in size depending on the architecture. On a 32 bit or 64 bit platform, ints are 4 bytes (32 bits). On an 8 or 16 bit platform, an int is 2 bytes (16 bits). To be safe, it’s best to specify types exactly e.g. int32_t, int16_t., int8_t. Defined in #include <inttypes.h>
Signed? Signedness is a common source of problems in C. It’s always clearest to specify signness explicitly (and size) using the [u]int[8,16,32]_t types. Signedness can introduce surprises when casting types because of “sign extension”. For example, char is signed (although it’s normally used as unsigned 0-255). If you cast char to int, it will be a signed integer and the sign will be extended. If you then cast that to unsigned, it turns a small negative number into an enormous int value (high bit set). That is, (uint32_t)(int)(char)(128) == 4294967168 (not 128!)
One easy solution to this is to always use uint8_t rather than char for byte streams.
9. 9 What is a Variable? Symbol table: The “Symbol Table” is the mapping from symbol to address. You can extract this from a compiled program (if debugging is enabled aqt compile time with –g) using the nm utility.
Declaration and definition mean slightly different things in C. Declaration is mapping a name to a type, but not to a memory location. Definition maps a name and type to a memory location (which could be the body of a function). Definitions can assign an initial value. Declarations can point to something that never actually gets defined, or is defined in a library external to your program.
What names are legal? Symbols in C (function and variable names) must start with an alphabetic character or ‘_’. They may contain numeric digits as well but cannot start with a digit. Common naming strategies are ‘CamelCase’, ‘lowerUpper’, and ‘under_bar’. Some people use complicated schemes but I prefer under_bar.
extern, static, and const are modifiers of variable declarations. ‘extern’ means the variable being declared is defined elsewhere in some other program module (.c file). ‘static’ means that the variable cannot be accessed outside of the current scope. ‘const’ means the variable’s value cannot be changed (directly).
Symbol table: The “Symbol Table” is the mapping from symbol to address. You can extract this from a compiled program (if debugging is enabled aqt compile time with –g) using the nm utility.
Declaration and definition mean slightly different things in C. Declaration is mapping a name to a type, but not to a memory location. Definition maps a name and type to a memory location (which could be the body of a function). Definitions can assign an initial value. Declarations can point to something that never actually gets defined, or is defined in a library external to your program.
What names are legal? Symbols in C (function and variable names) must start with an alphabetic character or ‘_’. They may contain numeric digits as well but cannot start with a digit. Common naming strategies are ‘CamelCase’, ‘lowerUpper’, and ‘under_bar’. Some people use complicated schemes but I prefer under_bar.
extern, static, and const are modifiers of variable declarations. ‘extern’ means the variable being declared is defined elsewhere in some other program module (.c file). ‘static’ means that the variable cannot be accessed outside of the current scope. ‘const’ means the variable’s value cannot be changed (directly).
10. 10 Multi-byte Variables
11. 11 Lexical Scoping Returns nothing.. “void” is used to denote that nothing is returned, or to indicate untyped data.
What if you define a new “char b”? What happens to the old one? If you defile a variable in your local scope that has the same name as a variable in an enclosing scope, the local version takes precedence.
Are definitions allowed in the middle of a block? Older versions of C and some derivatives like NesC only allow definitions at the beginning of a scope; d would be illegal. Modern C (gcc 3.0+) allows this, which can make the code easier to read as the definitions are closer to use. However, note that definitions are not always allowed in certain places, such as after a jump target, e.g.
{
goto target;
/* … */
target:
int z; /* error */
/* … */
}Returns nothing.. “void” is used to denote that nothing is returned, or to indicate untyped data.
What if you define a new “char b”? What happens to the old one? If you defile a variable in your local scope that has the same name as a variable in an enclosing scope, the local version takes precedence.
Are definitions allowed in the middle of a block? Older versions of C and some derivatives like NesC only allow definitions at the beginning of a scope; d would be illegal. Modern C (gcc 3.0+) allows this, which can make the code easier to read as the definitions are closer to use. However, note that definitions are not always allowed in certain places, such as after a jump target, e.g.
{
goto target;
/* … */
target:
int z; /* error */
/* … */
}
12. 12 Expressions and Evaluation
13. 13 Comparison and Mathematical Operators
14. 14 Assignment Operators Despite being syntactically correct, if (x=y) is generally to be avoided because it is confusing.
Doing complicated things with ++ and – are also not generally recommended. It is almost never important from an efficiency perspective and it just makes the code harder to understand.Despite being syntactically correct, if (x=y) is generally to be avoided because it is confusing.
Doing complicated things with ++ and – are also not generally recommended. It is almost never important from an efficiency perspective and it just makes the code harder to understand.
15. 15 A More Complex Program: pow Need braces? Braces are only needed if your block contains a single statement. However, braces, even when not necessary, can increase the clarity of your code, especially when nesting ‘if’ statements.
X ? Y : Z.. There is a shortcut to if then else that can be done in an expression: the ‘?:’ syntax. (cond ? alt1 : alt2) is an expression returning alt1 if cond, and alt2 if not cond. There is no truly equivalent form to this construct.
Short circuit evaluation. Sometimes when evaluating an if conditional, the result is known before the evaluation completes. For example, consider if (X || Y || Z). If X is true, then Y and Z don’t matter. C will not evaluate Y and Z if X is true. This only matters in cases where there are side effects of evaluating the rest of the expression, for example if it contains function calls, or invalid operations. The converse evaluation rules apply to &&; if (X && Y && Z) will only evaluate Y and Z if X is true. Note, this is why it’s OK to write:
char *x;
if ((x != NULL) && (*x == ‘Z’)) … because if x is NULL (and therefore an invlalid pointer) the second clause dereferencing x will not be run.
Note that this means that it’s not always possible to replace an if() with a function call because all args to a function are evaluated before calling the function.
Detecting brace errors. Many text editors such as emacs include features that help you detect brace errors. Emacs will remind you of the opening brace whenever you close a brace, and will auto-indent your code according to braces – wrong indenting gives you a hint that something is wrong. In emacs, <tab> will autoindent the line you are on.. Keeping your code properly indented is a good way to detect a lot of different kinds of error.
Need braces? Braces are only needed if your block contains a single statement. However, braces, even when not necessary, can increase the clarity of your code, especially when nesting ‘if’ statements.
X ? Y : Z.. There is a shortcut to if then else that can be done in an expression: the ‘?:’ syntax. (cond ? alt1 : alt2) is an expression returning alt1 if cond, and alt2 if not cond. There is no truly equivalent form to this construct.
Short circuit evaluation. Sometimes when evaluating an if conditional, the result is known before the evaluation completes. For example, consider if (X || Y || Z). If X is true, then Y and Z don’t matter. C will not evaluate Y and Z if X is true. This only matters in cases where there are side effects of evaluating the rest of the expression, for example if it contains function calls, or invalid operations. The converse evaluation rules apply to &&; if (X && Y && Z) will only evaluate Y and Z if X is true. Note, this is why it’s OK to write:
char *x;
if ((x != NULL) && (*x == ‘Z’)) … because if x is NULL (and therefore an invlalid pointer) the second clause dereferencing x will not be run.
Note that this means that it’s not always possible to replace an if() with a function call because all args to a function are evaluated before calling the function.
Detecting brace errors. Many text editors such as emacs include features that help you detect brace errors. Emacs will remind you of the opening brace whenever you close a brace, and will auto-indent your code according to braces – wrong indenting gives you a hint that something is wrong. In emacs, <tab> will autoindent the line you are on.. Keeping your code properly indented is a good way to detect a lot of different kinds of error.
16. 16 The “Stack” Variables declared ‘static’ inside of a function are allocated only once, rather than being allocated in the stack frame when the function is called. Static variables therefore retain their value from one call to the next, but may cause problems for recursive calls, or if two threads call the function concurrently.
Java users:
Just to be aware: recursion doesn’t always work in Java because objects are always passed by reference (i.e they are not copied). In java, this recursive function would only work if it was a primitive type like int being passed.Variables declared ‘static’ inside of a function are allocated only once, rather than being allocated in the stack frame when the function is called. Static variables therefore retain their value from one call to the next, but may cause problems for recursive calls, or if two threads call the function concurrently.
Java users:
Just to be aware: recursion doesn’t always work in Java because objects are always passed by reference (i.e they are not copied). In java, this recursive function would only work if it was a primitive type like int being passed.
17. 17 Iterative pow(): the “while” loop What about other languages? Some languages such as scheme implement optimized “tail recursion”, which reuses the current stack frame when the recursive call constitutes the last use of the current stack frame. In these cases, recursion costs no more than iteration.What about other languages? Some languages such as scheme implement optimized “tail recursion”, which reuses the current stack frame when the recursive call constitutes the last use of the current stack frame. In these cases, recursion costs no more than iteration.
18. 18 The “for” loop
19. 19 Referencing Data from Other Scopes
20. 20 Can a function modify its arguments?
21. 21 NO! Java
In java, primitive types like int are passed by value, but all objects are passed by reference (basically by pointer, but without any explicit awareness of the address).
In C++, things operate like in C except that there is a syntax to pass arguments by reference, declaring the argument to be a “reference” type. For example, int f(int &x); C does not support this; you have to do it “manually” with pointers. But it’s not really fundamentally different anyway.Java
In java, primitive types like int are passed by value, but all objects are passed by reference (basically by pointer, but without any explicit awareness of the address).
In C++, things operate like in C except that there is a syntax to pass arguments by reference, declaring the argument to be a “reference” type. For example, int f(int &x); C does not support this; you have to do it “manually” with pointers. But it’s not really fundamentally different anyway.
22. 22 Passing Addresses
23. 23 “Pointers”
24. 24 Pointer Validity How should pointers be initialized? Always initialize pointers to 0 (or NULL). NULL is never a valid pointer value, but it is known to be invalid and means “no pointer set”.
How should pointers be initialized? Always initialize pointers to 0 (or NULL). NULL is never a valid pointer value, but it is known to be invalid and means “no pointer set”.
25. 25 Answer: Invalid!
26. 26 More on Types
27. 27 Structures Packing? The layout of a structure in memory is actually architecture-dependent. The order of the fields will always be preserved; however, some architectures will pad fields so that they are word-aligned. Word alignment is important for some processor architectures because they do not support unaligned memory access, e.g. accessing a 4 byte int that is not aligned to a 4 byte boundary. It is also usually more efficient to process values on word boundaries.
So in general, structs are architecture-specific. For instance, on x86 platforms, all fields are word-aligned, so
struct {
uint16_t x;
uint32_t y;
};
Will have a hidden two bytes of padding between x and y. However, on an atmel atmega128 this padding will not be present.
One way to address this is to predict it, and explicity add the padding fields. Another solution is to add __attribute__ ((“packed”)) after the structure declaration to force the compiler to generate code to handle unaligned accesses. This code will operate slower but this is often the easiest solution to dealing with cross-platform compatibility.
The other common problem with cross-platform compatibility is endianness. X86 machines are little-endian; many modern processors such as the PXA (xscale) can select their endianness. The functions ntohl, htonl, etc. can be used to convert to a standard network byte order (which is big endian).
===========
Why a zero length array?
A zero length array is a typed “pointer target” that begins at the byte after the end of the struct. The zero length array takes up no space and does not change the result of sizeof(). This is very useful for referring to data following the struct in memory, for example if you have a packet header and a packet payload following the header:
struct hdr {
int src;
int dst;
uint8_t data[0];
};
If you have a buffer of data containing the packet and the payload:
uint8_t buf[200];
int buf_len;
/* cast the buffer to the header type (if it’s long enough!) */
if (buf_len >= sizeof(struct hdr)) {
struct hdr *h = (struct hdr *)buf;
/* now, h->data[0] is the first byte of the payload */
}
Packing? The layout of a structure in memory is actually architecture-dependent. The order of the fields will always be preserved; however, some architectures will pad fields so that they are word-aligned. Word alignment is important for some processor architectures because they do not support unaligned memory access, e.g. accessing a 4 byte int that is not aligned to a 4 byte boundary. It is also usually more efficient to process values on word boundaries.
So in general, structs are architecture-specific. For instance, on x86 platforms, all fields are word-aligned, so
struct {
uint16_t x;
uint32_t y;
};
Will have a hidden two bytes of padding between x and y. However, on an atmel atmega128 this padding will not be present.
One way to address this is to predict it, and explicity add the padding fields. Another solution is to add __attribute__ ((“packed”)) after the structure declaration to force the compiler to generate code to handle unaligned accesses. This code will operate slower but this is often the easiest solution to dealing with cross-platform compatibility.
The other common problem with cross-platform compatibility is endianness. X86 machines are little-endian; many modern processors such as the PXA (xscale) can select their endianness. The functions ntohl, htonl, etc. can be used to convert to a standard network byte order (which is big endian).
===========
Why a zero length array?
A zero length array is a typed “pointer target” that begins at the byte after the end of the struct. The zero length array takes up no space and does not change the result of sizeof(). This is very useful for referring to data following the struct in memory, for example if you have a packet header and a packet payload following the header:
struct hdr {
int src;
int dst;
uint8_t data[0];
};
If you have a buffer of data containing the packet and the payload:
uint8_t buf[200];
int buf_len;
/* cast the buffer to the header type (if it’s long enough!) */
if (buf_len >= sizeof(struct hdr)) {
struct hdr *h = (struct hdr *)buf;
/* now, h->data[0] is the first byte of the payload */
}
28. 28 Arrays char x[] and char *x are subtlely different. First, what’s the same: both will evaluate to an address that is type char *. In both cases, x[n] == *(x+n) and x == &(x[0]).
The difference lies in the fact that char x[10] allocates memory for that data and both &x and x always evaulates to the base address of the array, whereas char *x allocates only space for a pointer, and x simply evaulates to the value of that pointer.
Thus, in the case of char x[10], &x == x by definition. However this is almost never true for char *x because x contains the address of the memory where the characters are stored, &(x[0]), and &x is the address of that pointer itself!
char x[] and char *x are subtlely different. First, what’s the same: both will evaluate to an address that is type char *. In both cases, x[n] == *(x+n) and x == &(x[0]).
The difference lies in the fact that char x[10] allocates memory for that data and both &x and x always evaulates to the base address of the array, whereas char *x allocates only space for a pointer, and x simply evaulates to the value of that pointer.
Thus, in the case of char x[10], &x == x by definition. However this is almost never true for char *x because x contains the address of the memory where the characters are stored, &(x[0]), and &x is the address of that pointer itself!
29. 29 How to Parse and Define C Types
30. 30 Function Types
31. 31 Dynamic Memory Allocation %m: When %m is included in a printf format string, it prints a string naming the most recent error. A global variable ‘errno’ always contains the most recent error value to have occurred, and the function strerr() converts that to a string. %m is shorthand for printf(“%s”, strerr(errno))
Emstar tips: Emstar includes glib which includes various useful functions and macros. One of these is g_new0().
g_new0(typename, count) ? (typename *)calloc(count, sizeof(typename));
Emstar also provides elog() and elog_raw(), logging functions that integrate with emstar’s logging facilities. elog() works like printf, but takes one additional loglevel argument. You do not need to include a ‘\n’ in elog() messages as it is implied. elog_raw() takes a buffer and a length and dumps the bytes out in a prettyprinted fashion.
Emstar includes the useful “buf_t” library that manages automatically growing buffers. This is often the easiest way to allocate memory.
%m: When %m is included in a printf format string, it prints a string naming the most recent error. A global variable ‘errno’ always contains the most recent error value to have occurred, and the function strerr() converts that to a string. %m is shorthand for printf(“%s”, strerr(errno))
Emstar tips: Emstar includes glib which includes various useful functions and macros. One of these is g_new0().
g_new0(typename, count) ? (typename *)calloc(count, sizeof(typename));
Emstar also provides elog() and elog_raw(), logging functions that integrate with emstar’s logging facilities. elog() works like printf, but takes one additional loglevel argument. You do not need to include a ‘\n’ in elog() messages as it is implied. elog_raw() takes a buffer and a length and dumps the bytes out in a prettyprinted fashion.
Emstar includes the useful “buf_t” library that manages automatically growing buffers. This is often the easiest way to allocate memory.
32. 32 Caveats with Dynamic Memory Reference counting is one approach to tracking allocated memory. However, there is no perfect scheme. The main problem with reference counting is that correctly handling cyclically linked structures is very difficult, and likely to result in mis-counting in one direction or the other. Garbage collection is another solution; this is used by Java and other languages. However, this often causes performance problems and requires a very strict control over typing in the language, which C can’t easily provide by its nature of having direct access to the low levels.
Emstar uses a static reference model, where there is exactly one canonical reference to each object, and that reference will be automatically cleared when the object is destroyed. The only requirements are: the reference must be in stable allocation (static or dynamic, but you can’t move the memory) and you must always access the object via that single reference. We have found this scheme to be quite workable, as long as you are cognizant of these requirements.
Reference counting is one approach to tracking allocated memory. However, there is no perfect scheme. The main problem with reference counting is that correctly handling cyclically linked structures is very difficult, and likely to result in mis-counting in one direction or the other. Garbage collection is another solution; this is used by Java and other languages. However, this often causes performance problems and requires a very strict control over typing in the language, which C can’t easily provide by its nature of having direct access to the low levels.
Emstar uses a static reference model, where there is exactly one canonical reference to each object, and that reference will be automatically cleared when the object is destroyed. The only requirements are: the reference must be in stable allocation (static or dynamic, but you can’t move the memory) and you must always access the object via that single reference. We have found this scheme to be quite workable, as long as you are cognizant of these requirements.
33. 33 Some Common Errors and Hints What to do when malloc fails?
Except in certain cases, it’s going to be very difficult to recover from this state because lots of functions will start failing and the correct way to handle these errors is not obvious. If you continue without your memory, it is likely that your program will get into some very seriously broken state.
The main case where recovery IS possible is the case that the malloc() that failed is part of a single too-large transaction that can fail independently of the rest of the system. For example if the user asks your program to allocate a zillion bytes and you can’t, just abort that request and ask for the next request. But if your system is just generally out of memory, there’s not much you can do.
In emstar, processes automatically respawn and the system is generally designed to handle module failures, so a slow memory leak that causes you to exit and restart may allow a fairly graceful recovery.
Memmove
memmove() is written so that it will copy in the correct order to be able to shift data in a buffer, that is, if the source and destination buffers overlap. This is not true of the memcpy() function.
Use pointers as implied in-use flags!
One clever way to reduce the number of state variables is to avoid the use of “in use” flags when there is also a pointer that can be NULL. That is, if the item is not allocated (and therefore a NULL pointer), it’s not in use. If there’s a pointer there, then you assume that it is valid, initialized, etc. By avoiding this redundancy of state varaibles, you avoid the various cases where the two variables are out of sync.. Pointer there but not in use.. Or in use but no pointer.What to do when malloc fails?
Except in certain cases, it’s going to be very difficult to recover from this state because lots of functions will start failing and the correct way to handle these errors is not obvious. If you continue without your memory, it is likely that your program will get into some very seriously broken state.
The main case where recovery IS possible is the case that the malloc() that failed is part of a single too-large transaction that can fail independently of the rest of the system. For example if the user asks your program to allocate a zillion bytes and you can’t, just abort that request and ask for the next request. But if your system is just generally out of memory, there’s not much you can do.
In emstar, processes automatically respawn and the system is generally designed to handle module failures, so a slow memory leak that causes you to exit and restart may allow a fairly graceful recovery.
Memmove
memmove() is written so that it will copy in the correct order to be able to shift data in a buffer, that is, if the source and destination buffers overlap. This is not true of the memcpy() function.
Use pointers as implied in-use flags!
One clever way to reduce the number of state variables is to avoid the use of “in use” flags when there is also a pointer that can be NULL. That is, if the item is not allocated (and therefore a NULL pointer), it’s not in use. If there’s a pointer there, then you assume that it is valid, initialized, etc. By avoiding this redundancy of state varaibles, you avoid the various cases where the two variables are out of sync.. Pointer there but not in use.. Or in use but no pointer.
34. 34 Macros The difference between a macro and a static inline. These two constructs have some properties in common: they must be included in any module that uses them, they can’t be linked to, and they are generally inlined into the program. The key difference between them is that the macro is processed by the preprocessor whereas the static inline conforms to all of the semantics of any other function. The reason it must be included in each file is that it’s declared static, and the reason it’s more efficient is that it is inlined by default.
Static inlines cause their arguments to be evaluated before being applied whereas macros are a pure text substitution with no evaluation. Arguments to static inlines are type-checked whereas arguments to macros can be any type and are substituted as raw text. Static inline invocations can invoke statements AND return a value, whereas a macro can EITHER return a value (if it’s an expression) OR invoke statements.
A good example of the differences is the implementation of sqr():
#define SQR(a) ((a)*(a))
static inline float sqr(float a) { return a*a; }
SQR will evaulate its argument twice, i.e. SQR(x++) will compute x*(x+1)
sqr will evaliate its argument once, i.e. sqr(x++) will compute x*x
SQR can compute the square of any numberic quantity, whereas sqr can only handle floats. However, sqr will return a more helpful error message if you attempt to square a struct.
Since macros can do text manipulation, one nice way to use them is to use macros to generate static inlines.
More on C constants. Check K&R for the details on the modifiers that allow you to correctly type constants. For example, 45L is a long int. Watch out for preceding 0’s, this means it will interpret it as octal. These issues most often crop up when you are trying to specify a constant that is larger than the default int type.
Enums:
For defining sets of integer values, enums are preferable. Enums enforce uniqueness within an enum and can automatically number your values sequentially. But macros are preferred for constants that define tuning parameters (that don’t want uniqueness or sequential values) and other types like floats and strings.
Why expressions in parens:
If you leave off the parens then your expression may introduce precedence surprises when combined in other expressions. For example:
#define VALUE 2+45
int c = VALUE*3;
What was meant was (2+45)*3 .. But what we got was 2+(45*3)
It’s also a good idea to put parens around args of a macro, to avoid similar problems if one of the args is an expression:
#define SQR(x) ((x)*(x))
Why use do{}while(0)?
Multi-statement macros should be enclosed in do{}while(0) to avoid surprises when the macro is called as one statement in an if ():
if (fail) DBG(“help\n”);
would cause only the first statement to be conditional.The difference between a macro and a static inline. These two constructs have some properties in common: they must be included in any module that uses them, they can’t be linked to, and they are generally inlined into the program. The key difference between them is that the macro is processed by the preprocessor whereas the static inline conforms to all of the semantics of any other function. The reason it must be included in each file is that it’s declared static, and the reason it’s more efficient is that it is inlined by default.
Static inlines cause their arguments to be evaluated before being applied whereas macros are a pure text substitution with no evaluation. Arguments to static inlines are type-checked whereas arguments to macros can be any type and are substituted as raw text. Static inline invocations can invoke statements AND return a value, whereas a macro can EITHER return a value (if it’s an expression) OR invoke statements.
A good example of the differences is the implementation of sqr():
#define SQR(a) ((a)*(a))
static inline float sqr(float a) { return a*a; }
SQR will evaulate its argument twice, i.e. SQR(x++) will compute x*(x+1)
sqr will evaliate its argument once, i.e. sqr(x++) will compute x*x
SQR can compute the square of any numberic quantity, whereas sqr can only handle floats. However, sqr will return a more helpful error message if you attempt to square a struct.
Since macros can do text manipulation, one nice way to use them is to use macros to generate static inlines.
More on C constants. Check K&R for the details on the modifiers that allow you to correctly type constants. For example, 45L is a long int. Watch out for preceding 0’s, this means it will interpret it as octal. These issues most often crop up when you are trying to specify a constant that is larger than the default int type.
Enums:
For defining sets of integer values, enums are preferable. Enums enforce uniqueness within an enum and can automatically number your values sequentially. But macros are preferred for constants that define tuning parameters (that don’t want uniqueness or sequential values) and other types like floats and strings.
Why expressions in parens:
If you leave off the parens then your expression may introduce precedence surprises when combined in other expressions. For example:
#define VALUE 2+45
int c = VALUE*3;
What was meant was (2+45)*3 .. But what we got was 2+(45*3)
It’s also a good idea to put parens around args of a macro, to avoid similar problems if one of the args is an expression:
#define SQR(x) ((x)*(x))
Why use do{}while(0)?
Multi-statement macros should be enclosed in do{}while(0) to avoid surprises when the macro is called as one statement in an if ():
if (fail) DBG(“help\n”);
would cause only the first statement to be conditional.
35. 35 Macros and Readability
36. 36 Using “goto”
37. 37 Unrolling a Failed Initialization using goto
38. 38 High Level Question: Why is Software Hard? Answer(s):
Complexity: Every conditional (“if”) doubles number of paths through your code, every bit of state doubles possible states
Solution: reuse code paths, avoid duplicate state variables
Mutability: Software is easy to change.. Great for rapid fixes ?.. And rapid breakage ?.. always one character away from a bug
Solution: tidy, readable code, easy to understand by inspection.
Avoid code duplication; physically the same ? logically the same
Flexibility: Programming problems can be solved in many different ways. Few hard constraints ? plenty of “rope”.
Solution: discipline and idioms; don’t use all the rope
39. 39 Addressing Complexity
40. 40 Addressing Complexity Why return -1?
Return values in C are usually 0 or positive for success and negative to signal failure. The errno codes (man errno) list some meanings that can be used for error return codes. When a pointer is returned, NULL signifies a failure. Errno can be set to indicate a reason for the failure (by just assigning to errno).Why return -1?
Return values in C are usually 0 or positive for success and negative to signal failure. The errno codes (man errno) list some meanings that can be used for error return codes. When a pointer is returned, NULL signifies a failure. Errno can be set to indicate a reason for the failure (by just assigning to errno).
41. 41 Addressing Mutability
42. 42 Solutions to the pow() challenge question The recursive solution uses more stack space, but only a limited amount (max. 32 recursions), is easier to read and understand. Readability is valuable and not worth sacrificing to optimality unless there is a good reason to optimize. So I vote for recursion in this case.The recursive solution uses more stack space, but only a limited amount (max. 32 recursions), is easier to read and understand. Readability is valuable and not worth sacrificing to optimality unless there is a good reason to optimize. So I vote for recursion in this case.