200 likes | 341 Views
Win32 Programming. Lesson 6: Everything you never wanted to know about strings. Before We Begin. Several of you probably had problems with character types in the last assignment, especially when reading the command line
E N D
Win32 Programming Lesson 6: Everything you never wanted to know about strings
Before We Begin • Several of you probably had problems with character types in the last assignment, especially when reading the command line • Why? Because in Windows, strings aren’t always strings (if that makes sense)
Why? • Traditionally, a C string is a sequence of bytes, terminated by a NULL • Unfortunately, this only accommodates 256 different characters, and that’s too few for some languages (Kanji being the classic example)
DBCS • To fix this problem DBCS was created. • In a DBCS each character consists of 1 or 2 bytes • This means things like strlen don’t work correctly • Helper functions exist, but the solution is ugly • Enter UNICODE
WBCS • Wide Byte Character Set == Unicode • Consortium founded in 1988(!) • See http://www.unicode.org for more information that you could ever want • All characters are 16 bits in length
Why bother? • Enables easy data exchange between languages • Create a single binary that supports all languages • Improves execution efficiency
History • Unicode really is much more of a Windows 2000 thing… • Support in 98 was lacking • However, looking to the future, we’ll ignore the old 16-bit application space • Windows CE is Unicode only
Writing Unicode Code… • It’s possible to write pure Unicode applications using several new functions in the RTL • However, you can write code which is *both* very easily using macros
Unicode types • typedefunsigned short wchar_t; • Declared in string.h • wchar_tszBuffer[100] allocates 100 characters but not 100 bytes • Breaks strcat, strcpy etc. • Equivalent functions with wcs replacing str • e.g. wcscat
A Better Way • tchar.h • Introduces a series of macros which allows the program to use Unicode or not, depending on compilation options • Creates a new TYPE TCHAR which is equivalent to a char if _UNICODE is not defined, and a wchar_t if it is
Problems • Imagine this: • TCHAR *szError = “Error”; • wchar_t *szError = “Error”; • TCHAR *szError = L“Error”; • TCHAR *szError = _TEXT(“Error”);
Windows Unicode data • WCHAR: Unicode character • PWSTR: Pointer to a Unicode string • PWCSTR: Pointer to a constant Unicode string
Windows API Revisited • CreateWindowEx doesn’t exist… • Really, is CreateWindowExA and CreateWindowExW • One is ASCII, the other is Unicode • Switched in WinUser.h depending on the definition of UNICODE
Unicode Gotchas • Use type BYTE and PBYTE to define bytes • Use generic type TCHAR etc. • Use the TEXT macro • Beware string arithmetic… don’t think about sizeof(szBuffer) as the number of characters you can hold! Similarly, think about malloc
Windows functions • Use lstrcat, lstrcmp, lstrcmpi, lstrcpy and lstrlen instead of wcs/str counterparts • Some use the Windows function CompareString • Useful for fancy language comparisons • There are a whole host of these functions (like CharLower and CharLowerBuff…)
Type Conversion • Of course, sometimes you have to convert from ASCII to Unicode in a program • Use MultiByteToWideChar to make Wide characters • Use WideCharToMultiByte to make regular characters
Your pwn DLLs • You can write your DLLs to provide both ASCII and Unicode support • For example, imagine a routine which reverses a string… BOOL StringReverseW(PWSTR pWideCharStr) • Instead of writing a completely separate function for StringReverseA… it should convert to WCS and then call StringReverseW and then convert back
Prototype • BOOL StringReverseW(PWSTR pWideCharStr);BOOL StringReverseA(PSTR pMultibyteStr);#ifdef UNICODE#define StringReverseStringReverseW#else#define StringReverseStringReverseA#endif
Not-too-difficult Assignment • Sort n words from the command line in ascending alphabetic order (unless the –d flag is set , in which case descending), and have your program compile and run easily with MBCS or UNICODE set
Next Class • Simple Kernel Objects…