Main Content

CERT C++: FLP36-C

Preserve precision when converting integral values to floating-point type

Description

Rule Definition

Preserve precision when converting integral values to floating-point type.1

Polyspace Implementation

The rule checker checks for Precision loss in integer to float conversion.

Examples

expand all

Issue

Precision loss from integer to float conversion occurs when you cast an integer value to a floating-point type that cannot represent the original integer value.

For instance, the long int value 1234567890L is too large for a variable of type float.

Risk

If the floating-point type cannot represent the integer value, the behavior is undefined (see C11 standard, 6.3.1.4, paragraph 2). For instance, least significant bits of the variable value can be dropped leading to unexpected results.

Fix

Convert to a floating-point type that can represent the integer value.

For instance, if the float data type cannot represent the integer value, use the double data type instead.

When writing a function that converts an integer to floating point type, before the conversion, check if the integer value can be represented in the floating-point type. For instance, DBL_MANT_DIG * log2(FLT_RADIX) represents the number of base-2 digits in the type double. Before conversion to the type double, check if this number is greater than or equal to the precision of the integer that you are converting. To determine the precision of an integer num, use this code:

 size_t precision = 0;
 while (num != 0) {
    if (num % 2 == 1) {
      precision++;
    }
    num >>= 1;
 }

Some implementations provide a builtin function to determine the precision of an integer. For instance, GCC provides the function __builtin_popcount.

Example - Conversion of Large Integer to Floating-Point Type
#include <stdio.h>

int main(void) {
  long int big = 1234567890L;
  float approx = big;//Noncompliant
  printf("%ld\n", (big - (long int)approx));
  return 0;
}

In this example, the long int variable big is converted to float.

Correction — Use a Wider Floating-Point Type

One possible correction is to convert to the double data type instead of float.

#include <stdio.h>

int main(void) {
  long int big = 1234567890L;
  double approx = big;
  printf("%ld\n", (big - (long int)approx));
  return 0;
}

Check Information

Group: 49. Miscellaneous (MSC)

Version History

Introduced in R2019a


1 This software has been created by MathWorks incorporating portions of: the “SEI CERT-C Website,” © 2017 Carnegie Mellon University, the SEI CERT-C++ Web site © 2017 Carnegie Mellon University, ”SEI CERT C Coding Standard – Rules for Developing safe, Reliable and Secure systems – 2016 Edition,” © 2016 Carnegie Mellon University, and “SEI CERT C++ Coding Standard – Rules for Developing safe, Reliable and Secure systems in C++ – 2016 Edition” © 2016 Carnegie Mellon University, with special permission from its Software Engineering Institute.

ANY MATERIAL OF CARNEGIE MELLON UNIVERSITY AND/OR ITS SOFTWARE ENGINEERING INSTITUTE CONTAINED HEREIN IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.

This software and associated documentation has not been reviewed nor is it endorsed by Carnegie Mellon University or its Software Engineering Institute.