String Constants

From Appmethod Topics
Jump to: navigation, search

Go Up to Constants Overview Index

String Literals

String constants, also known as string literals, form a special category of constants used to handle fixed sequences of characters. A string literal is of data type array-of- const char and storage class static, written as a sequence of any number of characters surrounded by double quotes:

"This is literally a string!"

The null (empty) string is written "".

The characters inside the double quotes can include escape sequences. This code, for example:

"\t\t\"Name\"\\\tAddress\n\n"

prints like this:

               "Name"\        Address


"Name" is preceded by two tabs; Address is preceded by one tab. The line is followed by two new lines. The \" provides interior double quotes.

If you compile with the -A option for ANSI compatibility, the escape character sequence "\\" is translated to "\" by the compiler.

A literal string is stored internally as the given sequence of characters plus a final null character ('\0'). A null string is stored as a single '\0' character.

Four Types of String Literals in C++0x

By default, string literals are ANSI strings containing char characters. You can use the L, u, and U prefixes, before string literals, to specify that string literals should contain wide-characters or Unicode characters (Unicode Character Types and Literals (C++0x)):

  • A string literal preceded immediately by an L is a wide-character string containing characters of the wchar_t data type. When wchar_t is used in a C program, it is a type defined in stddef.h header file. In C++ programs, wchar_t is a keyword. The memory allocation for wchar_t strings is two bytes per character. The value of a single wide-character is that character's encoding in the execution wide-character set.
  • In C++0x programs, a string literal preceded immediately by an u character is a Unicode-character string containing characters of the char16_t data type. In C++0x programs, char16_t is a keyword declaring a 16-bit character type. char16_t defines UTF-16 character encoding for Unicode. The memory allocation for char16_t characters is two or four bytes per character.
  • In C++0x programs, a string literal preceded immediately by an U character is a Unicode-character string containing characters of the char32_t data type. In C++0x programs, char32_t is a keyword declaring a 32-bit character type. char32_t defines UTF-32 character encoding for Unicode. The memory allocation for char32_t characters is four bytes per character.


That is, in C++0x programs, we can use the following four types of string literals:

  • "ANSI string" - this is an ANSI string literal containing char characters;
  • L"Wide-character string" - this string literal contains wchar_t characters;
  • u"UTF-16 string" - this string literal contains char16_t Unicode characters in UTF-16 encoding;
  • U"UTF-32 string" - this string literal contains char32_t Unicode characters in UTF-32 encoding;

Concatenating String Literals

You can use the backslash (\) as a continuation character to extend a string constant across line boundaries:

puts("This is really \
a one-line string");

Adjacent string literals separated only by whitespace are concatenated during the parsing phase. In the following example,

#include <stdio.h>
int main() {
   char    *p;
   p = "This is an example of how the compiler "
     " will \nconcatenate very long strings for you"
     " automatically, \nresulting in nicer" " looking programs.";
   printf(p);
   return(0);
}

The output of the program is

This is an example of how the compiler will
concatenate very long strings for you automatically,
resulting in nicer looking programs.


See Also