主要内容

Overview of Syntactic Classes in Polyspace Query Language

Syntactic information in code refers to the specific arrangement and structure of syntax tokens, such as keywords, braces, and punctuation. The syntactic information captures the exact way your code appears, including formatting and style choices, and is directly affected by changes in the code’s syntax For example, consider these two functions:

int square(int x) {
    return x * x;
}
int square(int x) 
{
    return x * x;
}

Even though the two functions are semantically the same, their syntax is not identical and thus, the syntactic information in these two functions are different. For example, the function on the left has the opening curly brace on the same line as the prototype while the function on the right has the opening curly brace on the next line.

Polyspace® Query Language (PQL) interprets the syntax information of your code into a tree. The nodes of this tree represent various syntax information. Each kind of syntax tree node is modelled in PQL by a class. By using the predicates of the classes, you can check for specific interrelation between nodes and detect an issue. This topic summarizes the syntactic classes in PQL. Use these classes to create user-defined syntactic defects. For more information about how these classes are used, see Detect Syntactic Issues Using Polyspace Query Language Syntactic Classes.

To review the PQL syntactic classes, see Create Your Own Coding Rules and Coding Standard. Once you initialize a coding standard using polyspace-query-language init, these classes are located in the .polyspace\api\cpp\syntax folder.

Syntax Tree

When you run a Polyspace analysis, Polyspace determines a syntax tree of your code. Each of the nodes in the tree has a corresponding PQL class. For example, consider this code:

void foo() {
    int x;
    int y;
    return x+y;
}
The syntax tree for this code can be summarized like this:

The syntax tree of the code represented as a node diagram.

This image shows the syntax nodes and how they are interrelated. For example, the node translation_unit represents the entire translation unit. For this code, the translation unit starts at the (row, column) location (0,0) and ends at (5,0). This node has a child node function_definition, which in-turn has the children nodes function_declarator and compound_statement. A textual representation of this tree is:

translation_unit [0, 0] - [5, 0]
  function_definition [0, 0] - [4, 1]
    type: primitive_type [0, 0] - [0, 4]
    declarator: function_declarator [0, 5] - [0, 10]
      declarator: identifier [0, 5] - [0, 8]
      parameters: parameter_list [0, 8] - [0, 10]
    body: compound_statement [0, 11] - [4, 1]
      declaration [1, 4] - [1, 10]
        type: primitive_type [1, 4] - [1, 7]
        declarator: identifier [1, 8] - [1, 9]
      declaration [2, 4] - [2, 10]
        type: primitive_type[2, 4] - [2, 7]
        declarator: identifier [2, 8] - [2, 9]
      return_statement [3, 4] - [3, 15]
        binary_expression [3, 11] - [3, 14]
          left: identifier [3, 11] - [3, 12]
          right: identifier [3, 13] - [3, 14]

The nodes of the syntax tree is represented by a specific PQL class. You can use the PQL classes to query various properties of the syntax node and define your own syntax defect.

All syntax PQL classes are derived from the base class Cpp.Node.Node. This class is useful for traversing the syntax tree of a code. You can use the Cpp.Node.Node class to traverse a syntax tree. For more information, see Traverse Syntax Tree Using Polyspace Query Language. In addition, all PQL syntax classes have access to a set of common predicates implemented in the class AstNodeProperties. This class contains predicate that queries basic properties of a node such as its start/end location and basic information about its children.

Abstract Syntax Tree vs Concrete Syntax Tree

Syntax tree of C/C++ code are two types:

  • Abstract Syntax Tree (AST) — A simplified and high level syntax tree that drops some purely syntactic nodes.

  • Concrete Syntax Tree (CST) — A faithful reproduction of the source code test including all syntactic punctuation.

Consider this code:

int add(int a, int b) { return a + b; }
The concrete and abstract syntax tree for this code are compared in this table:

CSTAST

function_definition
  type: primitive_type "int"
  declarator: function_declarator
    declarator: identifier "add"
    parameters: parameter_list
      "("
      parameter_declaration
        type: primitive_type "int"
        declarator: identifier "a"
      ","
      parameter_declaration
        type: primitive_type "int"
        declarator: identifier "b"
      ")"
  body: compound_statement
    "{"
    return_statement
      "return"
      binary_expression
        left: identifier "a"
        operator: "+"
        right: identifier "b"
      ";"
    "}"

function_definition
  type: primitive_type "int"
  declarator: function_declarator
    declarator: identifier "add"
    parameters: parameter_list
      parameter_declaration
        type: primitive_type "int"
        declarator: identifier "a"
      parameter_declaration
        type: primitive_type "int"
        declarator: identifier "b"
  body: compound_statement
    return_statement
      binary_expression
        left: identifier "a"
        operator: "+"
        right: identifier "b"

Observe that the CST includes punctuations such as (/), {/}, and ; as leaf tokens. The AST does not contains these tokens.

PQL allows you to access both AST and CST of your code. For example, the predicates of AstNodeProperties allows for analyzing both AST and CST of your code.

Syntax Classes

PQL has a syntax class corresponding to each type of node in the syntax tree. For example, the classes corresponding to some of the syntax tree nodes in the preceding code are:

Syntax tree nodePQL class
translation_unitTranslationUnit
function_definitionFunctionDefinition
primitive_typePrimitiveType
function_declaratorFunctionDeclarator

Generally, you can find the PQL class for a syntax node by converting the node name to MixedCase format. Kinds of syntax classes include::

  • Basic syntax classes — Binary expressions, characters, files, scopes and other basic syntax nodes.

  • Abstract declarator classes — Abstract declaration of variables and functions. An abstract declaration declares an object without specifying its name.

  • Declarator classes — Declarator that declares various objects in C++ code.

  • Expression classes — Valid C/C++ expression.

  • Field declarator classes — Declarator for fields within a class, struct, or union.

  • Statement classes — Instruction that the program executes.

  • Type declarator classes —Declarator that declares the type of a variable or function.

  • Type specifier classes — Token that specifies the type of an object

  • Other syntax classes — Other less commonly used syntax nodes.

To see a list of all syntactic classes, see Create Your Own Coding Rules and Coding Standard. Once you initialize a coding standard using polyspace-query-language init, these classes are located in the .polyspace\api\cpp\syntax folder.

See Also