Misuse of sign-extended character value
Data type conversion with sign extension causes unexpected behavior
Description
This defect occurs when you
convert a signed or plain char
variable containing possible negative
values to a wider integer data type (or perform an arithmetic operation that does the
conversion) and then use the resulting value in one of these ways:
For comparison with
EOF
(using==
or!=
)As array index
As argument to a character-handling function in
ctype.h
, for instance,isalpha()
orisdigit()
If you convert a signed char
variable with a negative value to a
wider type such as int
, the sign bit is preserved (sign extension).
This can lead to specific problems even in situations where you think you have accounted
for the sign bit.
For instance, the signed char
value of -1 can represent the
character EOF
(end-of-file), which is an invalid character. Suppose a
char
variable var
acquires this value. If you
treat var
as a char
variable, you might want to
write special code to account for this invalid character value. However, if you perform
an operation such as var++
(involving integer promotion), it leads to
the value 0, which represents a valid value '\0'
by accident. You
transitioned from an invalid to a valid value through the arithmetic operation.
Even for negative values other than -1, a conversion from signed
char
to signed int
can lead to other issues.
For instance, the signed char
value -126 is equivalent to the
unsigned char
value 130 (corresponding to an extended character
'\202'
). If you convert the value from char
to
int
, the sign bit is preserved. If you then cast the resulting
value to unsigned int
, you get an unexpectedly large value,
4294967170 (assuming 32-bit int
). If your code expects the
unsigned char
value of 130 in the final unsigned
int
variable, you can see unexpected results.
The underlying cause of this issue is the sign extension during conversion to a wider
type. Most architectures use two's complement representation for storing values. In this
representation, the most significant bit indicates the sign of the value. When converted
to a wider type, the conversion is done by copying this sign bit to all the leading bits
of the wider type, so that the sign is preserved. For instance, the
char
value of -3 is represented as 11111101
(assuming 8-bit char
). When converted to int
, the
representation
is:
11111111 11111111 11111111 11111101
int
. However, when converted to
unsigned int
, the value (4294967293) is no longer the same as the
unsigned char
equivalent of the original char
value. If you are not aware of this issue, you can see unexpected results in your
code.Risk
In the following cases, Bug Finder flags use of variables after a conversion from
char
to a wider data type or an arithmetic operation that
implicitly converts the variable to a wider data type:
If you compare the variable value with EOF:
A
char
value of -1 can represent the invalid characterEOF
or the valid extended character value'\377'
(corresponding to theunsigned char
equivalent, 255). After achar
variable is cast to a wider type such asint
, because of sign extension, thechar
value -1, representing one ofEOF
or'\377'
becomes theint
value -1, representing onlyEOF
. Theunsigned char
value 255 can no longer be recovered from theint
variable. Bug Finder flags this situation so that you can cast the variable tounsigned char
first (or avoid thechar
-to-int
conversion or converting operation before comparison withEOF
). Only then, a comparison withEOF
is meaningful. See Sign-Extended Character Value Compared with EOF.If you use the variable value as an array index:
After a
char
variable is cast to a wider type such asint
, because of sign extension, all negative values retain their sign. If you use the negative values directly to access an array, you cause buffer overflow/underflow. Even when you account for the negative values, the way you account for them might result in incorrect elements being read from the array. See Sign-Extended Character Value Used as Array Index.If you pass the variable value as argument to a character-handling function:
According to the C11 standard (Section 7.4), if you supply an integer argument that cannot be represented as
unsigned char
orEOF
, the resulting behavior is undefined. Bug Finder flags this situation because negativechar
values after conversion can no longer be represented asunsigned char
orEOF
. For instance, the signedchar
value -126 is equivalent to theunsigned char
value 130, but the signedint
value -126 cannot be represented asunsigned char
orEOF
.
Fix
Before conversion to a wider integer data type, cast the signed or plain
char
value explicitly to unsigned
char
.
If you use the char
data type to not represent characters but
simply as a smaller data type to save memory, your use of sign-extended
char
values might avoid the risks mentioned earlier. If so,
add comments to your result or code to avoid another review. See:
Address Results in Polyspace User Interface Through Bug Fixes or Justifications if you review results in the Polyspace user interface.
Address Results in Polyspace Access Through Bug Fixes or Justifications (Polyspace Access) if you review results in a web browser.
Annotate Code and Hide Known or Acceptable Results if you review results in an IDE.
Examples
Result Information
Group: Programming |
Language: C | C++ |
Default: On for handwritten code, off for generated code |
Command-Line Syntax: CHARACTER_MISUSE |
Impact: Medium |
Version History
Introduced in R2017a
See Also
Find defects (-checkers)
| Invalid use of standard library integer routine
| Returned value of a sensitive function not checked
| Errno not checked
| Character value absorbed into EOF
Topics
- Interpret Bug Finder Results in Polyspace Desktop User Interface
- Interpret Bug Finder Results in Polyspace Access Web Interface (Polyspace Access)
- Address Results in Polyspace User Interface Through Bug Fixes or Justifications
- Address Results in Polyspace Access Through Bug Fixes or Justifications (Polyspace Access)