Unicode Character Types and Literals (C++0x)
Go Up to C++0x Features Index
BCC32 implements new character types and character literals for Unicode. These types are among the C++0x features added to BCC32.
Two new types represent Unicode characters:
- char16_t is a 16-bit character type. char16_t is a C++ keyword. This type can be used for UTF-16 characters.
- char32_t is a 32-bit character type. char32_t is a C++ keyword. This type can be used for UTF-32 characters.
The existing wchar_t type is a type for a wide character in the execution wide-character set. A wchar_t wide-character literal begins with an uppercase L (such as
There are two new forms to create character literals of the new types:
u'character'is a literal for a single char16_t character, such as
u'g'. A multicharacter literal such as
u'kh'is badly formed. The value of a char16_t literal is equal to its ISO 10646 code point value, provided that the code point is representable as a 16-bit value. Only characters in the basic multilingual plane (BMP) can be represented.
U'character'is a literal for a single char32_t character, such as
U't'. A multicharacter literal such as
U'de'is ill-formed. The value of a char32_t literal is equal to its ISO 10646 code point value.
Multibyte character literals were previously only of the form
L'characters', representing one or more characters of the type wchar_t. The value of a single character wide-character literal is that character's encoding in the execution wide-character set.
There are two new forms to create string literals of the new types:
u"UTF-16_string"is a string literal containing characters of the char16_t type, for example
U"UTF-32_string"is a string literal containing characters of the char32_t type, for example