Main Content

Universal character name from token concatenation

You create a universal character name by joining tokens with ## operator

Description

This defect occurs when two preprocessing tokens joined with a ## operator create a universal character name. A universal character name begins with \u or \U followed by hexadecimal digits. It represents a character not found in the basic character set.

For instance, you form the character \u0401 by joining two tokens:

#define assign(uc1, uc2, val) uc1##uc2 = val
...
assign(\u04, 01, 4);

Risk

The C11 Standard (Sec. 5.1.1.2) states that if a universal character name is formed by token concatenation, the behavior is undefined.

Fix

Use the universal character name directly instead of producing it through token concatenation.

Examples

expand all

#define assign(uc1, uc2, val) uc1##uc2 = val

int func(void) {
    int \u0401 = 0;
    assign(\u04, 01, 4); 
    return \u0401;
}

In this example, the assign macro, when expanded, joins the two tokens \u04 and 01 to form the universal character name \u0401.

Correction — Use Universal Character Name Directly

One possible correction is to use the universal character name \u0401 directly. The correction redefines the assign macro so that it does not join tokens.

#define assign(ucn, val) ucn = val

int func(void) {
    int \u0401 = 0;
    assign(\u0401, 4); 
    return \u0401;
}

Result Information

Group: Programming
Language: C | C++
Default: On for handwritten code, off for generated code
Command-Line Syntax: PRE_UCNAME_JOIN_TOKENS
Impact: Low

Version History

Introduced in R2018a