Show: Object Pascal C++
Display Preferences

String Constants

From Appmethod Topics
Jump to: navigation, search

Go Up to Constants Overview Index

String Literals

String constants, also known as string literals, form a special category of constants used to handle fixed sequences of characters. A string literal is of data type array-of- const char and storage class static, written as a sequence of any number of characters surrounded by double quotes:

"This is literally a string!"

The null (empty) string is written "".

The characters inside the double quotes can include escape sequences. This code, for example:

"\t\t\"Name\"\\\tAddress\n\n"

prints like this:

               "Name"\        Address


"Name" is preceded by two tabs; Address is preceded by one tab. The line is followed by two new lines. The \" provides interior double quotes.

If you compile with the -A option for ANSI compatibility, the escape character sequence "\\" is translated to "\" by the compiler.

A literal string is stored internally as the given sequence of characters plus a final null character ('\0'). A null string is stored as a single '\0' character.

Four Types of String Literals in C++0x

By default, string literals are ANSI strings containing char characters. You can use the L, u, and U prefixes, before string literals, to specify that string literals should contain wide-characters or Unicode characters (Unicode Character Types and Literals (C++0x)):

  • A string literal preceded immediately by an L is a wide-character string containing characters of the wchar_t data type. When wchar_t is used in a C program, it is a type defined in stddef.h header file. In C++ programs, wchar_t is a keyword. The memory allocation for wchar_t strings is two bytes per character. The value of a single wide-character is that character's encoding in the execution wide-character set.
  • In C++0x programs, a string literal preceded immediately by an u character is a Unicode-character string containing characters of the char16_t data type. In C++0x programs, char16_t is a keyword declaring a 16-bit character type. char16_t defines UTF-16 character encoding for Unicode. The memory allocation for char16_t characters is two or four bytes per character.
  • In C++0x programs, a string literal preceded immediately by an U character is a Unicode-character string containing characters of the char32_t data type. In C++0x programs, char32_t is a keyword declaring a 32-bit character type. char32_t defines UTF-32 character encoding for Unicode. The memory allocation for char32_t characters is four bytes per character.


That is, in C++0x programs, we can use the following four types of string literals:

  • "ANSI string" - this is an ANSI string literal containing char characters;
  • L"Wide-character string" - this string literal contains wchar_t characters;
  • u"UTF-16 string" - this string literal contains char16_t Unicode characters in UTF-16 encoding;
  • U"UTF-32 string" - this string literal contains char32_t Unicode characters in UTF-32 encoding;

Concatenating String Literals

You can use the backslash (\) as a continuation character to extend a string constant across line boundaries:

puts("This is really \
a one-line string");

Adjacent string literals separated only by whitespace are concatenated during the parsing phase. In the following example,

#include <stdio.h>
int main() {
   char    *p;
   p = "This is an example of how the compiler "
     " will \nconcatenate very long strings for you"
     " automatically, \nresulting in nicer" " looking programs.";
   printf(p);
   return(0);
}

The output of the program is

This is an example of how the compiler will
concatenate very long strings for you automatically,
resulting in nicer looking programs.


See Also

Personal tools
In other languages