560 likes | 699 Views
SECURE PROGRAMMING Chapter 2 Strings. Overview. Arrays and their Problems Character Strings Common String Manipulation errors String Vulnerabilities and exploits Mitigation Strategies String Handling Functions, the bad and the good Runtime Protection Strategies
E N D
SECURE PROGRAMMING Chapter 2 Strings
Overview • Arrays and their Problems • Character Strings • Common String Manipulation errors • String Vulnerabilities and exploits • Mitigation Strategies • String Handling Functions, the bad and the good • Runtime Protection Strategies • Some Notable Vulnerabilities • Summary
Arrays and their Problems 1) Hard to determine size. 2) Size defaults may not work. 3) Easy to index an array out of bounds. 4) Easy to write non-portable code (non-consistent handling, for example). 5) Size parameters may be wrong (see 3)) 6) Array copying may overflow the array 7) Pointer arithmetic may be incorrect.
Character Strings The problem: Many strings come from outside: • Command line arguments • Environment variables • Console or other input • Text files • Network Connections Strings are not built-in to C/C++, though there is (some) Library support
Character Strings: String Data Type Most people implement a string as a Null terminated array of characters; addressed by a pointer. Have all the problems of arrays magnified because most string manipulation is done through procedures. Five Important terms for arrays: • Bound = size of the array. • Lo = Address of first element of the array • Hi = Address of last element of the array • TooFar = The address of the one-too-far element of the array = Hi + 1 = Lo + Bound • Target size (Tsize) = Bound
Character Strings: String Data Type Two more terms for strings. • Null-terminated if there is a null character within the array. • Length: For null-terminated strings, the number of characters before the (first) null terminator. Problem with determining array size (clear procedure)
Character Strings: String Data Type More problems: What Characters? “Execution Character Set” -locale- setlocale() function Basic execution character set: 26 UC/LC letters, 10 digits 29 graphic characters, space, 33 control characters including HT VT FF Bell BS CR NL, NULL, DEL Execution character set may contain many characters, require multiple bytes to represent a character (multibyte character set); basic character set still present. Locale-specific shift states.
Character Strings: UTF-8 Can represent any character in the Unicode character set, use 1-4 bytes. 0-127, 1 Byte o.w As many 1 bits as the total number of bytes in the sequence, followed by a 0 bit; all succeeding bytes start with 10. Thus: If leading 0, 1 byte: If leading 11, start of multibyte code If leading 10, continuation of multibyte code. (Watch out for vulnerabilities!)
Wide Strings 16 or 32 bit characters Terminated with a null wide character. As is the case with regular strings (with caveats!) • Pointers point to left-most character. • The length is the number of wide characters preceding the null wide character. • The value is the sequence of code values of the contained wide characters, in order.
String Literals Enclosed in double quotes “ Wide string literals prefixed by L String literal tokens are concatenated together. If any of them is prefixed by L, the string is a wide string. Example in text, page 34. Null appended, used to initialize a static array. In C, such a string is modifiable (no 'const' modifier available) but modification is “forbidden”. Watch for declarations of the form: const char s[3] = “abc”; //Not Null terminated string. Use: const char s[] = “abc”
Strings in C++ • Proliferation of string classes. • Standardized (STL) down to • String = typedef for basic_string<char> • Wstring = typedef for basic_string<wchar_t> • Also allows: • null-terminated byte string (NTBS) • NTMBS is an NTBS that contains a sequence of valid multibyte characters and ends in the same shift state it starts.
Strings in C++ (2) basic_string class template specializations are safer than NTBS, but NTBS are required all over the place: • Literals are NTBS • Existing libraries need NTBS or NTMBS string objects are passed by value or reference, while c-strings are passed by pointer. Thank goodness for member function data aka c_str
Character Types Three types: • Plain • Signed • Unsigned May cause compiler warnings if the wrong type is used.
int Some gotcha's: • Getc and friends return an int so that EOF is an authentic -1. • Functions in ctype.h (cctype) like isalpha accept an int because they might be passed the result of a getc or similar. • In C, a character constant has type int, so that sizeof('a') is 4, not 1. In C++ a character constant has type char and its size is 1. Wide character literals have type wchar_t and multicharacter literals have type int.
Unsigned char and wchar_t Unsigned char: all bits handled equally; pure binary. No padding bits, no trap representation, no sign extension, etc. wchar_t: Can be used for natural-language character data. For characters in the basic character set, it does not matter, except for type compatibility issues.
Sizing String headaches Three important numbers: Size = number of bytes allocated to the array (sizeof(a)) Count = number of elements in the array (maybe different from size!) Length = Number of characters before null terminator. Notes: If characters are wide, size may be 2*count or 4*count. (depends on OS) Length MUST be smaller than count. See Program fragments in book, pages 40-41.
Common String Manipulation Errors • Use of gets NONONONONONONONO!!!!!!!!!! • Improperly bounded string copies. Do not use: • strcpy() • strcat() • sprintf() • Watch out for: • Input strings • Environment strings • Parameter strings.... (see programs, pp 42-47)
Common String Manipulation Errors • Sizing strings: • do not use strlen for wide strings; use wcslen • Multiply result by sizeof(wchar_t) Programs, pages 41-42 • Improperly bounded string input: • Do not use: • gets • cin of string with unbounded length • Unbounded string scanf See programs pp 42-43 (the program on page 43 is a typical implementation of gets)
Common String Manipulation Errors • Careless copying and concatenation of strings Program, page 44 • Watch for strcpy, strcat, memcpy, sprint, etc. • Off-by-one errors. (see program, page 47) • Null termination errors (pp 49-49) • String truncation • If you implement them yourself, you may still be in trouble! (page 50)
String Vulnerabilities and Exploits • String Vulnerabilities and Exploits • Where does your data come from? Are you sure? Program on page 51 is bad: • Uses gets • Doesn't even check the exit status of gets
String Vulnerabilities and exploits (see ASM code, pp 56-58) Effect called “Stack Smashing” Example follows (remember the code from IsPasswordOK?)
String Vulnerabilities and exploits This exploit is called “arc injection”
String Vulnerabilities and exploits • Code Injection: • Injection of malicious address and malicious code • Must be acceptable as legitimate input • May not cause abnormal termination • Must result in execution of the malicious code. • IsPasswordOK is vulnerable (page 65) • Exploit with fgets and strcpy on page 66 (unclear; obviously not tested).
String Vulnerabilities and exploits Arc injection aka return-into-libc includes: Branching to an existing function System(), exec(), setuid() are favorites Example of vulnerable code, page 70 Prevents memory-based protection schemes from working.
String Vulnerabilities and exploits Return-Oriented Programming “gadget” = sequence of instructions followed by return. Turing-complete set exists for many architectures, including x86, Solaris libc and there is a compiler. Programs use the stack; values are pushed/popped, return addresses can be skipped for branching. Actually similar to FORTH programming.
Mitigation Strategies Two kinds: Prevent buffer overflows Detect buffer overflows and recover securely Best to do defense in depth and apply both.
Mitigation Strategies Preventing Buffer Overflows: Cert recommends using a consistent plan for managing strings. Three models: • Caller allocates and frees Most likely to prevent memory leaks • Callee allocates, caller frees Ensures sufficient memory is available • Callee allocates and frees (only available in C++) Most secure of the three solutions
Mitigation Strategies Mitigation strategies: Caller allocates and frees: C <string.h> family expanded with c11 functions: strcpy_s strcat_s strncpy_s strncat_s See example 2.5, 2.6, pages 74,75
Mitigation Strategies Callee allocates and frees Biggest problems: DOS attack by exhausting memory Dynamic memory management errors Example 2.7 p 77 FILE *fmemopen , *open_memstream(signature, p78) to do memory “I/O” Example code, page 79 Dynamic allocation disallowed in safety-critical systems
Mitigation Strategies C++ string class pp 80-83
String Handling Functions, the bad and the good gets: replace with fgets or getchar Examples 2.9, 2.10, pp 84-86 … or gets_s Example 2.11, page 87 … or getline() (~= getdelim()) Example 2.12, p88
String Handling Functions, the bad and the good Strcpy() and strcat() Fixes: Allocate required space dynamically Strncpy and strncat are not recommended. Strlcpy() and strlcat() (always null-terminate result) strcpy_s and strcat_s (implementation, page 91) Strdup() (dynamically allocated, requires free(). Summary, pp 92-93
String Handling Functions, the bad and the good strncpy() and strncat() (p 93) See strncpy_s (p 95) and strncat_s (pp 97-98) strndup() (uses dynamic memory allocation) Summary on p 99
String Handling Functions, the bad and the good memcpy() and memmove(): replace by memcpy_s() and memmove_s() respectively Watch out for strlen(). There is an strlen_s, strnlen and strnlen_s, all identical.
Runtime Protection Strategies Detection and recovery Provided via: input validation the compiler and its runtime system (e. g. array bounds checking) Operating system
Runtime Protection StrategiesInput Validation Input data size checking. Object size checking (with ___builtin_object_size()) Use by turning on _FORTIFY_SOURCE=n for n ⩾ 1 (p 104, 105)
Runtime Protection StrategiesThe compiler, runtime system. Visual Studio Compiler-Generated Runtime Checks Turn on with flags: /RTCs turns on checks for: Local variable overflows (including arrays) Use of uninitialized variables Stack pointer corruption Can be tweaked: #pragma runtime_checks(“s”, off/restore) Runtime Bounds Checkers: Libsafe Libverify CRED
Runtime Protection StrategiesThe compiler, runtime system Stack Canaries: StackGuard GCC's Stack-Smashing Protector aka ProPolice -fstack-protector[-all] -wstack-protector C++ .NET stack overrun detection capability /GS recommend adding: #pragma strict_gs_check(on) recommend adding #pragma string_gs_check(on) Recommend compiling with /GS flag and linking with /GS compiledlibraries.
Runtime Protection StrategiesThe Operating System Address space layout randomization Linux (PaX project, 2000) Windows, since Vista MAC OS X since 2007/2011, IOS since 4.3 Nonexecutable Stacks W^X Data Execution Prevention (Microsoft Visual Studio) PaX marked stack as non-executable StackGap
Some Notable Vulnerabilities rlogin – strcpy Kerberos
Summary • Arrays and their Problems • Character Strings • Common String Manipulation errors • String Vulnerabilities and exploits • Mitigation Strategies • String Handling Functions, the bad and the good • Runtime Protection Strategies • Some Notable Vulnerabilities