Main Content

Modify Bug Finder Checkers Through Code Behavior Specifications

Polyspace® Bug Finder™ checks C/C++ code for defects or coding rule violations. To find these issues, some Bug Finder checkers require additional information outside your code.

You can specify most external information using the option -code-behavior-specifications. The option allows you to associate specific behaviors with elements in your code. For instance, you can map a custom library function to a standard function or specify that a function acts as a memory management function.

The option takes a file in one of these formats as argument:

  • XML

  • Datalog

The XML and Datalog format support different code behaviors. This topic describes the checker modifications supported with each file format. For the full list of options that can modify checkers, see Modify Default Behavior of Bug Finder Checkers.

XML Format

The code behavior specifications XML file associates specific behaviors with functions (or defines behaviors at global scope). The following sections list behaviors that are relevant to a Bug Finder analysis. For behaviors relevant to Code Prover, see Modify Code Prover Run-Time Checks Through Code Behavior Specifications (Polyspace Code Prover).

You can also see the supported behaviors in the sample template file code-behavior-specifications-template.xml provided with a Polyspace installation. The file is in polyspaceroot\polyspace\verifier\cxx\ where polyspaceroot is the Polyspace installation folder.

Mapping to Standard Function for Precision Improvement

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="custom_function" std="std_function"> 
        </function>
    </functions>
</specifications>

Use this entry in the XML file to reduce the number of false negatives in a Bug Finder analysis. Sometimes, the analysis does not model certain kinds of functions precisely because of inherent limitations in static verification. In those cases, if you find a standard function that is a close analog of your function, use this mapping. Though your function itself is not analyzed, the analysis is more precise at the locations where you call the function. For instance, if the analysis cannot model your function cos32 precisely and considers full range for its return value, map it to the cos function for a return value in [-1,1].

The analysis ignores the body of your function but emulates your function behavior in the following ways:

  • The analysis assumes the same return values for your function as the standard function.

    For instance, if you map your function cos32 to the standard function cos, the verification assumes that cos32 returns values in [-1,1].

  • The analysis checks for the same issues as it checks with the standard function.

    For instance, if you map your function acos32 to the standard function acos, the Invalid use of standard library routine (Polyspace Code Prover) check determines if the argument of acos32 is in [-1,1].

The functions that you can map to include:

  • Standard library functions from math.h.

  • Memory management functions from string.h.

If you run Code Prover, you can map to other functions to improve the verification precision. See Modify Code Prover Run-Time Checks Through Code Behavior Specifications (Polyspace Code Prover).

See also Extend Bug Finder Checkers for Standard Library Functions to Custom Libraries.

Mapping to Standard Function for Concurrency Detection

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="custom_function" std="std_function"> 
        </function>
    </functions>
</specifications>

Use this entry in the XML file for automatic detection of thread-creation functions and functions that begin and end critical sections. Polyspace supports automatic detection for certain families of multitasking primitives only. Extend the support using this XML entry.

If your thread-creation function, for instance, does not belong to one of the supported families, map your function to a supported concurrency primitive.

See Extend Concurrency Defect Checkers to Unsupported Multithreading Environments.

Blocking Functions, Keywords or Macros

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="my_func">
            <behavior name="FORBIDDEN_FUNC"/>
        </function>
    </functions>
</specifications>

Use this entry in the XML file to specify that a function my_func must not be used in your source code.

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <tokens>
        <token name="my_keyword" kind="keyword">
            <behavior name="FORBIDDEN_KEYWORD"/>
        </token>
    </tokens>
</specifications>

Use this entry in the XML file to specify that a keyword my_keyword must not be used in your source code.

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <token name="my_macro" kind="macro">
          <behavior name="FORBIDDEN_MACRO"/>
    </token>
</specifications>

Use this entry in the XML file to specify that a macro my_macro must not be used in your source code.

Note that you can use the * wildcard for functions and keywords (but not for macros). For instance, to forbid all functions whose names contain DEBUG, you can enter:

<function name="*DEBUG*">
      <behavior name="FORBIDDEN_FUNC"/>
</function>
You can specify multiple similarly-named macros in the list of forbidden macros using regular expressions. To use regular expressions, in addition to the attributes name and kind, you have to set an attribute regex to true. For instance, to forbid all macros starting with DEFINED_, enter:
<token name="DEFINED_.*" kind="macro" regex="true">
    <behavior name="FORBIDDEN_MACRO"/>
</token>

See Flag Deprecated or Unsafe Functions, Keywords, or Macros Using Bug Finder Checkers.

Extending Initialization Checks

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="my_func">
            <check name=name="ARGUMENT_POINTS_TO_INITIALIZED_VALUE" arg="n/>
        </function>
    </functions>
</specifications>

Use this entry in the XML file to specify if the pointer argument to a function my_func must point to an initialized buffer. The number n specifies which argument must be checked for buffer initialization.

See Extend Checkers for Initialization to Check Function Arguments Passed by Pointers.

Modifying Global Behavior

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <global_scope>
        <parameter name="MAX_NUMBER_NESTED_LEVEL_CONTROL_FLOW" value="n1"/>
        <parameter name="MAX_NUMBER_NESTED_LEVEL_INCLUDES" value="n2"/>
        <parameter name="MAX_NUMBER_CONSTANT_IN_ENUMERATION" value="n3"/>
        <parameter name="MAX_NUMBER_MACROS_TRANSLATION_UNIT" value="n4"/>
        <parameter name="MAX_NUMBER_MEMBERS_IN_STRUCT" value="n5"/>
        <parameter name="MAX_NUMBER_NESTED_MEMBERS_IN_STRUCT" value="n6"/>
        <parameter name="NUMBER_SIGNIFICANT_CHARACTER_EXTERNAL_IDENTIFIER" value="n7"/>
        <parameter name="NUMBER_SIGNIFICANT_CHARACTER_INTERNAL_IDENTIFIER" value="n8"/>
    </global_scope>
</specifications>
Here, n1,..,n8 specifies numerical values.

Use the entries n1,..,n6 for the following parameters to specify limits on certain aspects of your program. The modifications affect the checking of MISRA C:2012 Rule 1.1.

  • MAX_NUMBER_NESTED_LEVEL_CONTROL_FLOW: Maximum depth of nesting allowed in control flow statements.

  • MAX_NUMBER_NESTED_LEVEL_INCLUDES: Maximum levels of inclusion allowed using include files.

  • MAX_NUMBER_CONSTANT_IN_ENUMERATION: Maximum number of constants allowed in an enumeration.

  • MAX_NUMBER_MACROS_TRANSLATION_UNIT: Maximum number of macros allowed in a translation unit.

  • MAX_NUMBER_MEMBERS_IN_STRUCT: Maximum number of members allowed in a structure.

  • MAX_NUMBER_NESTED_MEMBERS_IN_STRUCT: Maximum levels of nesting allowed in a structure.

Use the entries n7 and n8 to specify how many characters must be compared to determine if two identifiers as identical. The modifications affect the checking of Rules 5.x.

  • NUMBER_SIGNIFICANT_CHARACTER_EXTERNAL_IDENTIFIER: Number of characters to compare for external identifiers. External identifiers are ones declared with global scope or storage class extern.

  • NUMBER_SIGNIFICANT_CHARACTER_INTERNAL_IDENTIFIER: Number of characters to compare for internal identifiers.

Specifying Real-Time Functions

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="my_func">
            <behavior name="REAL_TIME_FUNC"/>
        </function>
    </functions>
</specifications>

Use this entry in the XML file to specify a function my_func as a real-time function See AUTOSAR C++14 Rule A18-5-7.

Specifying Functions that Manage Memory

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="my_func">
            <behavior name="MANAGES_MEMORY"/>
        </function>
    </functions>
</specifications>

Use this entry in the XML file to specify a function my_func as a function that allocates or deallocates dynamic memory. See AUTOSAR C++14 Rule A18-5-7.

Specifying Functions that Handle Exceptions

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <functions>
        <function name="my_func">
            <behavior name="EXCEPTION_HANDLING"/>
        </function>
    </functions>
</specifications>

Use this entry in the XML file to specify a function my_func as a function that handles one or more exceptions. See AUTOSAR C++14 Rule A15-0-7.

Specifying Critical Data Members

XML Syntax:

<?xml version="1.0" encoding="UTF-8"?>
<specifications xmlns="http://www.mathworks.com/PolyspaceCodeBehaviorSpecifications">
    <members>
        <member name="my_critical_object" kind="variable">
            <behavior name="CRITICAL_DATA"/>
        </member>
    </members>
</specifications>

Use this entry in the XML file to specify the data member my_critical_object as critical. See Critical data member is not private.

You can specify multiple similarly-named data members in the list of critical data members using regular expressions. To use regular expressions, in addition to the attributes name and kind, you have to set an attribute regex to true.

For instance, to consider all member names starting with user as critical data members, enter:

<member name="user.*" kind="variable" regex="true">
    <behavior name="CRITICAL_DATA"/>
</member>

Datalog Format

Datalog is a declarative logic programming language that Polyspace Bug Finder uses to create abstractions of function behavior. You can enter code behavior specifications in Datalog and add to the set of functions already abstracted for the Polyspace analysis.

The Datalog language, as used in Polyspace, is based on declaring relations between elements in the code. However, you do not need to know the full Datalog language to hook into the Polyspace analysis. You can simply invoke already declared relations from the Datalog-based API, using a function-like syntax.

Datalog Example

Consider the following example. The function insertIntoDatabase() reads a string from standard input and uses the string in the construction of an SQL query.

#include <stdio.h>
#include <string.h>
#include <sqlite3.h>

void addToName(char *, char *);

// Function to insert names into database
void insertIntoDatabase(void)
{
    sqlite3* db;
    char query[256] = "";
    char partialName[256];
    char fullName[256];
    
    char* error_message;
    int nameLength;

    sqlite3_open("userCredentials.db", &db);

    fread(partialName, 1, 128, stdin);
    
    addToName(partialName, fullName); //Function with unknown semantics

    strcat(query, "INSERT INTO privatenames (name) VALUES ('");
    strcat(query, fullName);
    strcat(query, "');");

    // Possible SQL injection here
    if (sqlite3_exec(db, query, NULL, NULL, &error_message))
        {
            sqlite3_free(error_message);
        }
}
This code has the possibility of an SQL injection if the function addToName() copies the content of its first argument to its second argument entirely or partly. The function is effectively passing on contents entered through standard input into the SQL query.

However, the defect checker SQL injection cannot find this defect because the function addToName() is not defined in this code and its semantics are not known. (In some cases, if the function is sufficiently complex, the analysis might lose track of a data flow from one function argument to another despite the function being defined.)

To help with the analysis, you can externally specify the semantics of the addToName() function using Datalog syntax. In a file with extension .dl, for instance bufferFlows.dl, enter the following:

.include "models/interfaces/taint.dl"

Taint.Basic.hasFlow("addToName", $InParameterDeref(0), $OutParameterDeref(1), "Source buffer copied to destination").
This file:

  • Includes the file polyspaceroot\polyspace\pql\operating_system\models\interfaces\taint.dl. Here, polyspaceroot is the Polyspace installation folder, for instance, C:\Program Files\R2024b and operating_system is one of win64, glnxa64, and maci64.

    This file contains declarations of the relations that you invoke in your file.

  • Invokes the relation Taint.Basic.hasFlow to specify that there is a data flow from the first parameter of the function addToName() to the second parameter:

    • $InParameterDeref(0) indicates the first parameter and designates the parameter as an input parameter. If a parameter is designated as an input parameter, its value prior to the function call is relevant for the relation.

    • $OutParameterDeref(1) indicates the second parameter and designates the parameter as output parameter. If a parameter is designated as an output parameter, its value after the function call is relevant for the relation.

    • The last argument is a message that can be associated with the function addToName. If Bug Finder finds a defect that uses the Datalog specification, the message appears in the event list below the defect.

Specify the Datalog file as argument for the analysis option -code-behavior-specifications, for instance:

-code-behavior-specifications bufferFlows.dl
The SQL injection checker is now able to detect the data flow through the function addToName() and report a possible SQL injection when the function sqlite3_exec() is executed.

The two options that support Datalog file specification are:

Behaviors Supported by Datalog Format

The following checkers can be modified using code behavior specifications in Datalog format.

The following code behaviors can be specified to modify the checkers:

  • Taint.Basic.hasFlow: Specifies a function that propagates data from one function argument to another.

    For instance, suppose that the following function addToName() propagates data from the dereference of the first parameter to the dereference of the second parameter:

    void addToName (char *, char *);
    You can specify this behavior using the following Datalog relation:
    Taint.Basic.hasFlow("addToName", $InParameterDeref(0), $OutParameterDeref(1), "Source buffer copied to destination").

    All supported checkers can be modified using this code behavior.

  • Untrusted.Basic.isSource: Specifies a function parameter or return value that represents potentially tainted data.

    For instance, the following specifications:

    Untrusted.Basic.isSource("fileOpen", $OutParameterDeref(1), "Opens an untrusted stream").
    Untrusted.Basic.isSource("fileOpen_v2", $OutReturnDeref(), "Returns an untrusted stream").
    Imply that these data are potentially tainted:

    • Dereference of the second parameter of a fileOpen() function, for instance with signature FILE * fileOpen (const char *, FILE * , const char*).

    • Dereference of the return value of a fileOpen_v2() function, for instance with signature FILE * fileOpen_v2 ( int, const char*).

    All supported checkers can be modified using this code behavior.

  • Sql.Basic.execution: Specifies a function that executes an SQL query.

    For instance,suppose that the following function sql_exec() executes an SQL query and the first parameter is the query string:

    int sql_exec (char*, sqlite3* , int (*callback)(void*,int,char**,char**) , void*, char**);
    You can specify this behavior using the following Datalog relation:
    Sql.Basic.execution("sql_exec", $InParameterDeref(0)).

    The checker that can be modified using this behavior is SQL injection.

  • Sql.Basic.sanitizing: Specifies a function that sanitizes data to be used in an SQL query.

    For instance, suppose that the following function sql_sanitize() sanitizes data to be used in an SQL query and the first parameter is the sanitized string:

    int sql_sanitize (char *, int);
    You can specify this behavior using the following Datalog relation:
    Sql.Basic.sanitizing("sql_sanitize", $OutParameterDeref(0)).

    The checker that can be modified using this behavior is SQL injection.

  • Ldap.Basic.sensitive: Specifies a function that executes an LDAP query.

    For instance, suppose that the following function search_ldap_db() executes an LDAP search and the fourth parameter is the search filter string:

    int search_ldap_db (LDAP *, const char *, int, const char *, char **, int);
    You can specify this behavior using the following Datalog relation:
    Ldap.Basic.sensitive("search_ldap_db", $InParameterDeref(3)).

    The checker that can be modified using this behavior is LDAP injection.

  • Ldap.Basic.sanitizing:Specifies a function that sanitizes data to be used in an LDAP query.

    For instance, suppose that the following function ldap_validate_inputs() sanitizes data to be used in an LDAP search and the first parameter is the sanitized string:

    int ldap_validate_inputs (char *, int);
    You can specify this behavior using the following Datalog relation:
    Ldap.Basic.sanitizing("ldap_validate_inputs", $OutParameterDeref(0)).
    

    The checker that can be modified using this behavior is LDAP injection.

  • CustomTainted.Basic.taintSource: Specifies a function that must be considered as an tainted or untrusted source for the checker Tainted source used with sensitive function. This related must be specified as:

    .comp checkerConfigName : CustomTainted {
          Basic.taintSource("getResponse",$OutReturnValue(),"Data from getResponse() is tainted.").
    }
    .init checkerName = tainted_source_use_custom<checkerConfigName>

  • CustomTainted.Basic.sensitive: Specifies a function that must be considered as a sensitive function for the checker Tainted source used with sensitive function. This relation must be specified as:

    .comp checkerConfigName : CustomTainted {
          Basic.sensitive("changeSystemState",$InParameterValue(0),"changeSystemState() must not use tainted data.").
    }
    .init checkerName = tainted_source_use_custom<checkerConfigName>

  • CustomTainted.Basic.sanitizing: Specifies a function that must be considered as a sanitizing function for the checker Tainted source used with sensitive function. This related must be specified as:

    .comp checkerConfigName : CustomTainted {
          Basic.sanitizing("ensureSafeResponse",$OutParameterDeref(0)).
    }
    .init checkerName = tainted_source_use_custom<checkerConfigName>

See Also

Related Topics