Unicode character escape sequences 


Мы поможем в написании ваших работ!



ЗНАЕТЕ ЛИ ВЫ?

Unicode character escape sequences



A Unicode character escape sequence represents a Unicode character. Unicode character escape sequences are processed in identifiers (§2.4.2), character literals (§2.4.4.4), and regular string literals (§2.4.4.5). A Unicode character escape is not processed in any other location (for example, to form an operator, punctuator, or keyword).

unicode-escape-sequence:
\u hex-digit hex-digit hex-digit hex-digit
\U hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit hex-digit

A Unicode escape sequence represents the single Unicode character formed by the hexadecimal number following the “\u” or “\U” characters. Since C# uses a 16-bit encoding of Unicode code points in characters and string values, a Unicode character in the range U+10000 to U+10FFFF is not permitted in a character literal and is represented using a Unicode surrogate pair in a string literal. Unicode characters with code points above 0x10FFFF are not supported.

Multiple translations are not performed. For instance, the string literal “\u005Cu005C” is equivalent to “\u005C” rather than “\”. The Unicode value \u005C is the character “\”.

The example

class Class1
{
static void Test(bool \u0066) {
char c = '\u0066';
if (\u0066)
System.Console.WriteLine(c.ToString());
}
}

shows several uses of \u0066, which is the escape sequence for the letter “f”. The program is equivalent to

class Class1
{
static void Test(bool f) {
char c = 'f';
if (f)
System.Console.WriteLine(c.ToString());
}
}

Identifiers

The rules for identifiers given in this section correspond exactly to those recommended by the Unicode Standard Annex 31, except that underscore is allowed as an initial character (as is traditional in the C programming language), Unicode escape sequences are permitted in identifiers, and the “@” character is allowed as a prefix to enable keywords to be used as identifiers.

identifier:
available-identifier
@ identifier-or-keyword

available-identifier:
An identifier-or-keyword that is not a keyword

identifier-or-keyword:
identifier-start-character identifier-part-charactersopt

identifier-start-character:
letter-character
_ (the underscore character U+005F)

identifier-part-characters:
identifier-part-character
identifier-part-characters identifier-part-character

identifier-part-character:
letter-character
decimal-digit-character
connecting-character
combining-character
formatting-character

letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl
A unicode-escape-sequence representing a character of classes Lu, Ll, Lt, Lm, Lo, or Nl

combining-character:
A Unicode character of classes Mn or Mc
A unicode-escape-sequence representing a character of classes Mn or Mc

decimal-digit-character:
A Unicode character of the class Nd
A unicode-escape-sequence representing a character of the class Nd

connecting-character:
A Unicode character of the class Pc
A unicode-escape-sequence representing a character of the class Pc

formatting-character:
A Unicode character of the class Cf
A unicode-escape-sequence representing a character of the class Cf

For information on the Unicode character classes mentioned above, see The Unicode Standard, Version 3.0, section 4.5.

Examples of valid identifiers include “identifier1”, “_identifier2”, and “@if”.

An identifier in a conforming program must be in the canonical format defined by Unicode Normalization Form C, as defined by Unicode Standard Annex 15. The behavior when encountering an identifier not in Normalization Form C is implementation-defined; however, a diagnostic is not required.

The prefix “@” enables the use of keywords as identifiers, which is useful when interfacing with other programming languages. The character @ is not actually part of the identifier, so the identifier might be seen in other languages as a normal identifier, without the prefix. An identifier with an @ prefix is called a verbatim identifier. Use of the @ prefix for identifiers that are not keywords is permitted, but strongly discouraged as a matter of style.

The example:

class @class
{
public static void @static(bool @bool) {
if (@bool)
System.Console.WriteLine("true");
else
System.Console.WriteLine("false");
}
}

class Class1
{
static void M() {
cl\u0061ss.st\u0061tic(true);
}
}

defines a class named “class” with a static method named “static” that takes a parameter named “bool”. Note that since Unicode escapes are not permitted in keywords, the token “cl\u0061ss” is an identifier, and is the same identifier as “@class”.

Two identifiers are considered the same if they are identical after the following transformations are applied, in order:

· The prefix “@”, if used, is removed.

· Each unicode-escape-sequence is transformed into its corresponding Unicode character.

· Any formatting-characters are removed.

Identifiers containing two consecutive underscore characters (U+005F) are reserved for use by the implementation. For example, an implementation might provide extended keywords that begin with two underscores.

Keywords

A keyword is an identifier-like sequence of characters that is reserved, and cannot be used as an identifier except when prefaced by the @ character.

keyword: one of
abstract as base bool break
byte case catch char checked
class const continue decimal default
delegate do double else enum
event explicit extern false finally
fixed float for foreach goto
if implicit in int interface
internal is lock long namespace
new null object operator out
override params private protected public
readonly ref return sbyte sealed
short sizeof stackalloc static string
struct switch this throw true
try typeof uint ulong unchecked
unsafe ushort using virtual void
volatile while

In some places in the grammar, specific identifiers have special meaning, but are not keywords. Such identifiers are sometimes referred to as “contextual keywords”. For example, within a property declaration, the “get” and “set” identifiers have special meaning (§10.7.2). An identifier other than get or set is never permitted in these locations, so this use does not conflict with a use of these words as identifiers. In other cases, such as with the identifier “var” in implicitly typed local variable declarations (§8.5.1), a contectual keyword can conflict with declared names. In such cases, the declared name takes precedence over the use of the identifier as a contextual keyword.

Literals

A literal is a source code representation of a value.

literal:
boolean-literal
integer-literal
real-literal
character-literal
string-literal
null-literal

Boolean literals

There are two boolean literal values: true and false.

boolean-literal:
true
false

The type of a boolean-literal is bool.

Integer literals

Integer literals are used to write values of types int, uint, long, and ulong. Integer literals have two possible forms: decimal and hexadecimal.

integer-literal:
decimal-integer-literal
hexadecimal-integer-literal

decimal-integer-literal:
decimal-digits integer-type-suffixopt

decimal-digits:
decimal-digit
decimal-digits decimal-digit

decimal-digit: one of
0 1 2 3 4 5 6 7 8 9

integer-type-suffix: one of
U u L l UL Ul uL ul LU Lu lU lu

hexadecimal-integer-literal:
0x hex-digits integer-type-suffixopt
0X hex-digits integer-type-suffixopt

hex-digits:
hex-digit
hex-digits hex-digit

hex-digit: one of
0 1 2 3 4 5 6 7 8 9 A B C D E F a b c d e f

The type of an integer literal is determined as follows:

· If the literal has no suffix, it has the first of these types in which its value can be represented: int, uint, long, ulong.

· If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.

· If the literal is suffixed by L or l, it has the first of these types in which its value can be represented: long, ulong.

· If the literal is suffixed by UL, Ul, uL, ul, LU, Lu, lU, or lu, it is of type ulong.

If the value represented by an integer literal is outside the range of the ulong type, a compile-time error occurs.

As a matter of style, it is suggested that “L” be used instead of “l” when writing literals of type long, since it is easy to confuse the letter “l” with the digit “1”.

To permit the smallest possible int and long values to be written as decimal integer literals, the following two rules exist:

· When a decimal-integer-literal with the value 2147483648 (231) and no integer-type-suffix appears as the token immediately following a unary minus operator token (§7.7.2), the result is a constant of type int with the value −2147483648 (−231). In all other situations, such a decimal-integer-literal is of type uint.

· When a decimal-integer-literal with the value 9223372036854775808 (263) and no integer-type-suffix or the integer-type-suffix L or l appears as the token immediately following a unary minus operator token (§7.7.2), the result is a constant of type long with the value −9223372036854775808 (−263). In all other situations, such a decimal-integer-literal is of type ulong.

Real literals

Real literals are used to write values of types float, double, and decimal.

real-literal:
decimal-digits. decimal-digits exponent-partopt real-type-suffixopt
. decimal-digits exponent-partopt real-type-suffixopt
decimal-digits exponent-part real-type-suffixopt
decimal-digits real-type-suffix

exponent-part:
e signopt decimal-digits
E signopt decimal-digits

sign: one of
+ -

real-type-suffix: one of
F f D d M m

If no real-type-suffix is specified, the type of the real literal is double. Otherwise, the real type suffix determines the type of the real literal, as follows:

· A real literal suffixed by F or f is of type float. For example, the literals 1f, 1.5f, 1e10f, and 123.456F are all of type float.

· A real literal suffixed by D or d is of type double. For example, the literals 1d, 1.5d, 1e10d, and 123.456D are all of type double.

· A real literal suffixed by M or m is of type decimal. For example, the literals 1m, 1.5m, 1e10m, and 123.456M are all of type decimal. This literal is converted to a decimal value by taking the exact value, and, if necessary, rounding to the nearest representable value using banker's rounding (§4.1.7). Any scale apparent in the literal is preserved unless the value is rounded or the value is zero (in which latter case the sign and scale will be 0). Hence, the literal 2.900m will be parsed to form the decimal with sign 0, coefficient 2900, and scale 3.

If the specified literal cannot be represented in the indicated type, a compile-time error occurs.

The value of a real literal of type float or double is determined by using the IEEE “round to nearest” mode.

Note that in a real literal, decimal digits are always required after the decimal point. For example, 1.3F is a real literal but 1.F is not.

Character literals

A character literal represents a single character, and usually consists of a character in quotes, as in 'a'.

character-literal:
' character '

character:
single-character
simple-escape-sequence
hexadecimal-escape-sequence
unicode-escape-sequence

single-character:
Any character except ' (U+0027), \ (U+005C), and new-line-character

simple-escape-sequence: one of
\' \" \\ \0 \a \b \f \n \r \t \v

hexadecimal-escape-sequence:
\x hex-digit hex-digitopt hex-digitopt hex-digitopt

A character that follows a backslash character (\) in a character must be one of the following characters: ', ", \, 0, a, b, f, n, r, t, u, U, x, v. Otherwise, a compile-time error occurs.

A hexadecimal escape sequence represents a single Unicode character, with the value formed by the hexadecimal number following “\x”.

If the value represented by a character literal is greater than U+FFFF, a compile-time error occurs.

A Unicode character escape sequence (§2.4.1) in a character literal must be in the range U+0000 to U+FFFF.

A simple escape sequence represents a Unicode character encoding, as described in the table below.

 

Escape sequence Character name Unicode encoding
\' Single quote 0x0027
\" Double quote 0x0022
\\ Backslash 0x005C
\0 Null 0x0000
\a Alert 0x0007
\b Backspace 0x0008
\f Form feed 0x000C
\n New line 0x000A
\r Carriage return 0x000D
\t Horizontal tab 0x0009
\v Vertical tab 0x000B

 

The type of a character-literal is char.

String literals

C# supports two forms of string literals: regular string literals and verbatim string literals.

A regular string literal consists of zero or more characters enclosed in double quotes, as in "hello", and may include both simple escape sequences (such as \t for the tab character), and hexadecimal and Unicode escape sequences.

A verbatim string literal consists of an @ character followed by a double-quote character, zero or more characters, and a closing double-quote character. A simple example is @"hello". In a verbatim string literal, the characters between the delimiters are interpreted verbatim, the only exception being a quote-escape-sequence. In particular, simple escape sequences, and hexadecimal and Unicode escape sequences are not processed in verbatim string literals. A verbatim string literal may span multiple lines.

string-literal:
regular-string-literal
verbatim-string-literal

regular-string-literal:
" regular-string-literal-charactersopt "

regular-string-literal-characters:
regular-string-literal-character
regular-string-literal-characters regular-string-literal-character

regular-string-literal-character:
single-regular-string-literal-character
simple-escape-sequence
hexadecimal-escape-sequence
unicode-escape-sequence

single-regular-string-literal-character:
Any character except " (U+0022), \ (U+005C), and new-line-character

verbatim-string-literal:
@" verbatim-string-literal-charactersopt "

verbatim-string-literal-characters:
verbatim-string-literal-character
verbatim-string-literal-characters verbatim-string-literal-character

verbatim-string-literal-character:
single-verbatim-string-literal-character
quote-escape-sequence

single-verbatim-string-literal-character:
Any character except "

quote-escape-sequence:
""

A character that follows a backslash character (\) in a regular-string-literal-character must be one of the following characters: ', ", \, 0, a, b, f, n, r, t, u, U, x, v. Otherwise, a compile-time error occurs.

The example

string a = "hello, world"; // hello, world
string b = @"hello, world"; // hello, world

string c = "hello \t world"; // hello world
string d = @"hello \t world"; // hello \t world

string e = "Joe said \"Hello\" to me"; // Joe said "Hello" to me
string f = @"Joe said ""Hello"" to me"; // Joe said "Hello" to me

string g = "\\\\server\\share\\file.txt"; // \\server\share\file.txt
string h = @"\\server\share\file.txt"; // \\server\share\file.txt

string i = "one\r\ntwo\r\nthree";
string j = @"one
two
three";

shows a variety of string literals. The last string literal, j, is a verbatim string literal that spans multiple lines. The characters between the quotation marks, including white space such as new line characters, are preserved verbatim.

Since a hexadecimal escape sequence can have a variable number of hex digits, the string literal "\x123" contains a single character with hex value 123. To create a string containing the character with hex value 12 followed by the character 3, one could write "\x00123" or "\x12" + "3" instead.

The type of a string-literal is string.

Each string literal does not necessarily result in a new string instance. When two or more string literals that are equivalent according to the string equality operator (§7.10.7) appear in the same program, these string literals refer to the same string instance. For instance, the output produced by

class Test
{
static void Main() {
object a = "hello";
object b = "hello";
System.Console.WriteLine(a == b);
}
}

is True because the two literals refer to the same string instance.

The null literal

null-literal:
null

The null-literal can be implicitly converted to a reference type or nullable type.

Operators and punctuators

There are several kinds of operators and punctuators. Operators are used in expressions to describe operations involving one or more operands. For example, the expression a + b uses the + operator to add the two operands a and b. Punctuators are for grouping and separating.

operator-or-punctuator: one of
{ } [ ] ().,:;
+ - * / % & | ^! ~
= < >???:: ++ -- && ||
-> ==!= <= >= += -= *= /= %=
&= |= ^= << <<= =>

right-shift:
>|>

right-shift-assignment:
>|>=

The vertical bar in the right-shift and right-shift-assignment productions are used to indicate that, unlike other productions in the syntactic grammar, no characters of any kind (not even whitespace) are allowed between the tokens. These productions are treated specially in order to enable the correct handling of type-parameter-lists (§10.1.3).

Pre-processing directives

The pre-processing directives provide the ability to conditionally skip sections of source files, to report error and warning conditions, and to delineate distinct regions of source code. The term “pre-processing directives” is used only for consistency with the C and C++ programming languages. In C#, there is no separate pre-processing step; pre-processing directives are processed as part of the lexical analysis phase.

pp-directive:
pp-declaration
pp-conditional
pp-line
pp-diagnostic
pp-region
pp-pragma

The following pre-processing directives are available:

· #define and #undef, which are used to define and undefine, respectively, conditional compilation symbols (§2.5.3).

· #if, #elif, #else, and #endif, which are used to conditionally skip sections of source code (§2.5.4).

· #line, which is used to control line numbers emitted for errors and warnings (§2.5.7).

· #error and #warning, which are used to issue errors and warnings, respectively (§2.5.5).

· #region and #endregion, which are used to explicitly mark sections of source code (§2.5.6).

· #pragma, which is used to specify optional contextual information to the compiler (§2.5.8).

A pre-processing directive always occupies a separate line of source code and always begins with a # character and a pre-processing directive name. White space may occur before the # character and between the # character and the directive name.

A source line containing a #define, #undef, #if, #elif, #else, #endif, #line, or #endregion directive may end with a single-line comment. Delimited comments (the /* */ style of comments) are not permitted on source lines containing pre-processing directives.

Pre-processing directives are not tokens and are not part of the syntactic grammar of C#. However, pre-processing directives can be used to include or exclude sequences of tokens and can in that way affect the meaning of a C# program. For example, when compiled, the program:

#define A
#undef B

class C
{
#if A
void F() {}
#else
void G() {}
#endif

#if B
void H() {}
#else
void I() {}
#endif
}

results in the exact same sequence of tokens as the program:

class C
{
void F() {}
void I() {}
}

Thus, whereas lexically, the two programs are quite different, syntactically, they are identical.



Поделиться:


Последнее изменение этой страницы: 2016-12-14; просмотров: 633; Нарушение авторского права страницы; Мы поможем в написании вашей работы!

infopedia.su Все материалы представленные на сайте исключительно с целью ознакомления читателями и не преследуют коммерческих целей или нарушение авторских прав. Обратная связь - 18.118.120.109 (0.048 с.)