Verification of Neural Networks

This topic shows how to use Deep Learning Toolbox™ Verification Library to verify deep neural networks for safety-critical applications.

Engineers increasingly incorporate neural networks into safety-critical applications, including self-driving cars, diagnostic tools evaluating medical scans, and therapeutic AI language models. As a result, regulators and industries are working together to develop new industry-specific safety standards including standards for AI models.

Neural networks must reliably give correct answers as well as recognize their own uncertainty. Engineers must also be able to measure and confirm the reliability of their networks. In MATLAB^®, you can verify properties of your neural network using Deep Learning Toolbox™ Verification Library. The toolbox includes functionalities to test and improve the robustness of your neural network and to perform out-of-distribution detection. Use the drise function to generate saliency maps that explain the predictions of an object detector.

The full verification workflow can include a variety of tools. The following two links provide examples of the end-to-end workflow.

For a video showing how to verify a medical imaging classification model, see Understanding and Verifying Your AI Models.
The Runway Sign Classifier: Certify an Airborne Deep Learning System (DO Qualification Kit) example shows how to approach the certification of an airborne deep learning system that must comply with aviation industry standards such as DO-178C and ARP-4754.

Neural Network Robustness

Diagram of an adversarial example. Adding an imperceptible perturbation to the image causes the model to misclassify it.

Robustness measures how much the predictions of a neural network change with perturbations to the input data.

Neural networks can be susceptible to a phenomenon called adversarial examples [1]. You can generate adversarial examples by applying small perturbations to examples from the training data such that the model output is incorrect. These perturbations can be small enough to be imperceptible to a human. The Generate Untargeted and Targeted Adversarial Examples for Image Classification example demonstrates how to calculate perturbations that result either in a random wrong answer, or a particular wrong answer for an image classification algorithm.

This behavior can have obvious safety ramifications, both intentionally (through a targeted attack) and unintentionally (through random chance). The Generate Adversarial Examples for Semantic Segmentation example demonstrates how to generate adversarial examples for a network that uses semantic segmentation to identify different elements in images of road traffic. In the example, you trick a neural network into not recognizing a cyclist on an image of a road by adding an imperceptible perturbation.

Figure shows the original and adversarially perturbed images of road traffic, with and without semantic segmentation overlay.

Measure Robustness

The Verify Robustness of Deep Learning Neural Network example shows how to verify the adversarial robustness of a deep learning neural network.

Deep Learning Toolbox Verification Library includes two functions to measure the robustness of a neural network. For classification models, use verifyNetworkRobustness. For regression models, use estimateNetworkOutputBounds.

Both functions calculate an upper bound to the range of outputs of a neural network given a range of inputs. They are both based on the DeepPoly algorithm [2]. DeepPoly uses a mix of interval arithmetic and propagation of constraint polyhedra to rigorously compute the output bounds of a network. Statistical techniques, based on sampling a random subset of inputs within a given input perturbation size, can strictly disprove robustness only for a given bound. DeepPoly, by contrast, can strictly prove robustness as well. The computations DeepPoly performs are different for different layers of a neural network. For some layers, the algorithm finds exact bounds. For other layers, the algorithm finds only upper limits to the bounds. If those upper limits are within the desired maximum output, then the network is robust. However, if those upper limits are larger than the desired maximum output, then robustness is unproven.

The verifyNetworkRobustness function checks whether the network classifies all outputs for a given input range as the same class. The function has three possible outputs.

"verified" — The network is robust to adversarial inputs between the specified bounds.
"violated" — The network is not robust to adversarial inputs between the specified bounds.
"unproven" — The function cannot prove whether the network is robust to adversarial inputs between the specified bounds.

The function verifies the network using the final layer. For most applications, use the final fully connected layer for verification. If your network has a different layer as its final layer (for example, softmax), remove the layer before calling the function.

The estimateNetworkOutputBounds function estimates the range of output values that a network returns when the input is between the specified lower and upper bounds.

Measure Robustness of Pretrained Image Classification Network

This example uses:

Open Live Script

Use the verifyNetworkRobustness function to test the robustness of a pretrained image classification network.

load("digitsRobustClassificationMLPNetwork.mat");
netRobust = removeLayers(netRobust,"softmax");
netRobust = initialize(netRobust);
[XTest,TTest] = digitTest4DArrayData;
X = XTest(:,:,:,1:10);
label = TTest(1:10);
X = dlarray(X,"SSCB");
perturbation = 0.01;
XLower = X - perturbation;
XUpper = X + perturbation;
result = verifyNetworkRobustness(netRobust,XLower,XUpper,label);
summary(result)

result: 10x1 categorical

     verified         10 
     violated          0 
     unproven          0 
     <undefined>       0

Use the estimateNetworkOutputBounds function to estimate the output bounds of a pretrained regression network.

load("digitsRegressionMLPNetwork.mat");
[XTest,~,TTest] = digitTest4DArrayData;
X = dlarray(XTest(:,:,:,1:10),"SSCB"); 
T = TTest(1:10);
perturbation = 0.01;
XLower = X - perturbation; 
XUpper = X + perturbation;
[YLower,YUpper] = estimateNetworkOutputBounds(net,XLower,XUpper);

Plot the resulting estimates.

YLower = extractdata(YLower);
YUpper = extractdata(YUpper);
figure
errorbar(1:10,T,T-YLower',YUpper'-T,"k*")
axis padded
xlabel("Observation") 
ylabel("Angle of Rotation")

Figure contains an axes object. The axes object with xlabel Observation, ylabel Angle of Rotation contains an object of type errorbar.

Improve Robustness

You can use several methods to improve the robustness of your neural network.

The Train Image Classification Network Robust to Adversarial Examples example shows how to train a neural network that is robust to adversarial examples using fast gradient sign method (FGSM) adversarial training. In adversarial training, you apply adversarial perturbations to the training data during the training process. The network learns how to classify the perturbed images correctly and therefore is more robust to adversarial examples.

The Train Robust Deep Learning Network with Jacobian Regularization example shows how to train a neural network that is robust to adversarial examples using a Jacobian regularization scheme in a custom training loop. In this example, you augment the training data by adding random noise. Then you add a term to the loss function that penalizes the network for large changes in the prediction with respect to small changes in the input.

Out-of-Distribution Detection

Figure shows two wordclouds comparing in-distribution and out-of-distribution text data.

A trained neural network is a function that takes inputs of a given format and produces an output of a given format. Input data that is qualitatively different from training data still results in an output, as long as the format is correct. For example, consider a classification network that classifies images as "cat" or "cucumber." If you ask the network to classify an image of a dog, the network is likely to classify it as "cat," since a dog is in most ways much more similar to a cat than a cucumber.

To determine whether the output of your neural network is meaningful for a given input, you can classify the input data into in-distribution and out-of-distribution data.

In-distribution (ID) data is any data that you use to construct and train your model. Additionally, any data that is sufficiently similar to the training data is also said to be ID.

Out-of-distribution (OOD) data is data that is sufficiently different from the training data, for example, data collected in a different way, at a different time, under different conditions, or for a different task than the data on which the model was originally trained. Models can receive OOD data when you deploy them in an environment other than the one in which you train them. For example, suppose you train a model on clear X-ray images but then deploy the model on images taken with a lower quality camera.

Out-of-distribution detection algorithms work by calculating a confidence score and comparing it to a threshold. You can manually choose this threshold. You can also use Deep Learning Toolbox Verification Library to calculate the threshold based on one of several statistical techniques. To do so, use the networkDistributionDiscriminator function. The function returns a discriminator object that contains information about the network, the algorithm used to calculate the confidence scores, and the threshold used to separate data into ID and OOD.

The Out-of-Distribution Detection for Deep Neural Networks example uses a pretrained classification network and softmax scores to determine if data is ID or OOD. The example also compares different ways of determining the distribution threshold.

The Out-of-Distribution Data Discriminator for YOLO v4 Object Detector example trains an object detector and creates a distribution discriminator object using the histogram-based outlier scores (HBOS) method for the confidence scores and a true positive goal for the threshold determination.

The Out-of-Distribution Detection for LSTM Document Classifier example trains a document classifier and compares different distribution discrimination algorithms to determine whether text data is ID or OOD.

Calculate Confidence Scores

You can use several types of metrics for computing distribution confidence scores. Two such metrics are softmax-based and density-based methods.

Softmax-based methods use the softmax preactivations, that is, the inputs to the softmax layer, to compute the scores. You can use this class of methods only for classification networks. Deep Learning Toolbox Verification Library provides the baseline method, ODIN method, and energy method.

Density-based methods use the probability density functions or layer activations to compute the scores. You can use this class of methods on different types of network architectures. Deep Learning Toolbox Verification Library provides the HBOS method.

For more information on softmax-based methods and density-based methods, see Distribution Confidence Scores.

The following diagrams show examples of softmax-based and density-based discriminators.

Example of Softmax-Based Discriminator

Example of Density-Based Discriminator

Diagram of deep neural network with additional discriminator output. The discriminator takes the softmax values and computes the distribution confidence score. If the score is greater than a threshold, then the input is predicted as in-distribution, otherwise the input is predicted as out-of-distribution.

Diagram of deep neural network with additional discriminator output. The discriminator takes the values from specified network layers and computes the distribution confidence score. If the score is greater than a threshold, then the input is predicted as in-distribution, otherwise the input is predicted as out-of-distribution.

Compare Different Distribution Detection Algorithms

This example uses:

Open Live Script

Deep Learning Toolbox Verification Library provides four distribution confidence score algorithms: the baseline method, energy method, ODIN method, and HBOS method. Compare the behavior of the four methods by using them on the same data. First, load the ID data, a set of images of handwritten digits. Modify the ID training data to create an OOD set.

XID = digitTrain4DArrayData;
XOOD = XID.*0.3 + 0.1;

Compare the ID and OOD data.

figure
tiledlayout(2,4,Padding="compact")
for i = 1:4
    nexttile(i)
    imshow(XID(:,:,:,i))
    title("ID "+i)
    nexttile(4+i)
    imshow(XOOD(:,:,:,i))
    title("OOD "+i)
end

Convert XID and XOOD into dlarray objects.

XID = dlarray(XID,"SSCB");
XOOD = dlarray(XOOD,"SSCB");

Next, load a network pretrained on the XID data set and create the discriminator object using the networkDistributionDiscriminator function. To determine whether your data is ID or OOD, pass the discriminator object to the isInNetworkDistribution function. To calculate the distribution scores of your data, pass the discriminator object to the distributionScores function. The function uses the algorithm specified by the "Method" property of the discriminator.

Baseline Distribution Discriminator

The BaselineDistributionDiscriminator object uses the baseline method to compute distribution confidence scores. The baseline method uses the softmax scores to compute the confidence scores. Predictions with high softmax scores have high baselineconfidence scores.

Load a classification network pretrained on the ID data set. Create the discriminator.

load("digitsClassificationMLPNetwork.mat");
discriminatorBaseline = networkDistributionDiscriminator(net,XID,XOOD,"baseline")

discriminatorBaseline = 
  BaselineDistributionDiscriminator with properties:

       Method: "baseline"
      Network: [1×1 dlnetwork]
    Threshold: 0.9743

To assess the performance of the discriminator on the OOD data, calculate the TPR and FPR.

tfOODBaseline = isInNetworkDistribution(discriminatorBaseline,XOOD);
tfIDBaseline = isInNetworkDistribution(discriminatorBaseline,XID);
TPRBaseline = nnz(tfIDBaseline)/numel(tfIDBaseline)

TPRBaseline = 
0.9856

FPRBaseline = nnz(tfOODBaseline)/numel(tfOODBaseline)

FPRBaseline = 
0.0598

To compare the distribution scores of the ID and OOD data, use the plotDistributionScores function, defined at the end of this example.

scoresIDBaseline = distributionScores(discriminatorBaseline,XID);
scoresOODBaseline = distributionScores(discriminatorBaseline,XOOD);
figure
plotDistributionScores(discriminatorBaseline,scoresIDBaseline,scoresOODBaseline)

Figure contains an axes object. The axes object with xlabel Distribution Confidence Scores, ylabel Frequency contains 3 objects of type histogram, constantline. These objects represent In-distribution scores, Out-of-distribution scores, Threshold.

Energy Distribution Discriminator

The EnergyDistributionDiscriminator object is a distribution discriminator that uses the energy method to compute distribution confidence scores. It is a softmax-based method. To tune the algorithm, use the Temperature name-value argument.

Load a classification network pretrained on the ID data set. Create the discriminator. Set Temperature to 0.1.

load("digitsClassificationMLPNetwork.mat");
discriminatorEnergy = networkDistributionDiscriminator(net,XID,XOOD,"energy", ...
    Temperature=0.1)

discriminatorEnergy = 
  EnergyDistributionDiscriminator with properties:

         Method: "energy"
        Network: [1×1 dlnetwork]
    Temperature: 0.1000
      Threshold: 8.7246

To assess the performance of the discriminator on the OOD data, calculate the TPR and FPR.

tfOODEnergy = isInNetworkDistribution(discriminatorEnergy,XOOD);
tfIDEnergy = isInNetworkDistribution(discriminatorEnergy,XID);
TPREnergy = nnz(tfIDEnergy)/numel(tfIDEnergy)

TPREnergy = 
0.9106

FPREnergy = nnz(tfOODEnergy)/numel(tfOODEnergy)

FPREnergy = 
0.0934

To compare the distribution scores of the ID and OOD data, use the plotDistributionScores function, defined at the end of this example.

scoresIDEnergy = distributionScores(discriminatorEnergy,XID);
scoresOODEnergy = distributionScores(discriminatorEnergy,XOOD);
figure
plotDistributionScores(discriminatorEnergy,scoresIDEnergy,scoresOODEnergy)

ODIN Distribution Discriminator

The ODINDistributionDiscriminator object enables you to compute distribution confidence scores by using the out-of-distribution detector for neural networks (ODIN) method. It is a softmax-based method. Similar to the energy distribution discriminator, the method is based on a rescaling of the softmax scores parameterized by the Temperature name-value argument. When the Temperature parameter is 1, the ODIN distribution discriminator is equal to the baseline distribution discriminator.

Load a classification network pretrained on the ID data set. Create the discriminator. Set Temperature to 2.

load("digitsClassificationMLPNetwork.mat");
discriminatorODIN = networkDistributionDiscriminator(net,XID,XOOD,"odin", ...
    Temperature=2)

discriminatorODIN = 
  ODINDistributionDiscriminator with properties:

         Method: "odin"
        Network: [1×1 dlnetwork]
    Temperature: 2
      Threshold: 0.7687

To assess the performance of the discriminator on the OOD data, calculate the TPR and FPR.

tfOODODIN = isInNetworkDistribution(discriminatorODIN,XOOD);
tfIDODIN = isInNetworkDistribution(discriminatorODIN,XID);
TPRODIN = nnz(tfIDODIN)/numel(tfIDODIN)

TPRODIN = 
0.9766

FPRODIN = nnz(tfOODODIN)/numel(tfOODODIN)

FPRODIN = 
0.0558

To compare the distribution scores of the ID and OOD data, use the plotDistributionScores function, defined at the end of this example.

scoresIDODIN = distributionScores(discriminatorODIN,XID);
scoresOODODIN = distributionScores(discriminatorODIN,XOOD);
figure
plotDistributionScores(discriminatorODIN,scoresIDODIN,scoresOODODIN)

HBOS Distribution Discriminator

The HBOSDistributionDiscriminator object uses the histogram based outlier scores (HBOS) method to compute distribution confidence scores and is a density-based method. Density-based methods compute distribution scores by describing the underlying features learned by the network as probabilistic models. Observations falling into areas of low density correspond to OOD observations.

The HBOS algorithm assumes that the features are statistically independent. The principal component features are pairwise linearly independent but they can have nonlinear dependencies. To investigate feature dependencies, you can use functions such as corr (Statistics and Machine Learning Toolbox) (Statistics and Machine Learning Toolbox). For an example showing how to investigate feature dependence, see Out-of-Distribution Data Discriminator for YOLO v4 Object Detector. If the features are not statistically independent, then the algorithm can return poor results. Using multiple layers to compute the distribution scores can increase the number of statistically dependent features.

You can use the HBOS distribution discriminator on different types of network architectures, including regression networks.

Load a regression network pretrained on the ID data set. Create the discriminator. Use the penultimate layer to compute the HBOS distribution scores. Set the VarianceCutoff value to 0.05.

load("digitsRegressionMLPNetwork.mat");
discriminatorHBOS = networkDistributionDiscriminator(net,XID,XOOD,"hbos", ...
    VarianceCutoff=0.05, ...
    LayerNames="relu_2")

discriminatorHBOS = 
  HBOSDistributionDiscriminator with properties:

            Method: "hbos"
           Network: [1×1 dlnetwork]
        LayerNames: "relu_2"
    VarianceCutoff: 0.0500
         Threshold: -6.2918

To assess the performance of the discriminator on the OOD data, calculate the TPR and FPR.

tfOODHBOS = isInNetworkDistribution(discriminatorHBOS,XOOD);
tfIDHBOS = isInNetworkDistribution(discriminatorHBOS,XID);
TPRHBOS = nnz(tfIDHBOS)/numel(tfIDHBOS)

TPRHBOS = 
0.7334

FPRHBOS = nnz(tfOODHBOS)/numel(tfOODHBOS)

FPRHBOS = 
0.0508

To compare the distribution scores of the ID and OOD data, use the plotDistributionScores function, defined at the end of this example.

scoresIDHBOS = distributionScores(discriminatorHBOS,XID);
scoresOODHBOS = distributionScores(discriminatorHBOS,XOOD);
figure
plotDistributionScores(discriminatorHBOS,scoresIDHBOS,scoresOODHBOS)

Supporting Function

This function plots a histogram of ID distribution scores and OOD distribution scores.

function plotDistributionScores(discriminator,scoresID,scoresOOD)
    hID = histogram(scoresID);
    hold on
    hOOD = histogram(scoresOOD);
    xl = xlim;
    hID.BinWidth = (xl(2)-xl(1))/25;
    hOOD.BinWidth = (xl(2)-xl(1))/25;
    xline(discriminator.Threshold)
    l = legend(["In-distribution scores","Out-of-distribution scores","Threshold"],Location="best");
    title(l,discriminator.Method+" distribution discriminator")
    xlabel("Distribution Confidence Scores")
    ylabel("Frequency")
    hold off
end

Calculate Out-of-Distribution Threshold

After you calculate the distribution scores of your data, compare them to a threshold to decide whether your data is in-distribution or out-of-distribution.

Manual Threshold

One option is to choose a threshold manually. For the baseline scores, you can use the isInNetworkDistribution function and pass the network, the data, and the threshold as input arguments. In this case, the function normalizes the threshold to lie between 0 and 1.

Optimal Threshold

You can also measure the quality of a threshold by the TPR and FPR. A good discriminator has a TPR close to 1 and an FPR close to 0.

The networkDistributionDiscriminator function calculates an optimal threshold. The optimization metric depends on the inputs.

If you provide only ID data to the networkDistributionDiscriminator function, or if you set the TruePositiveGoal name-value argument, then the function returns the threshold that correctly classifies at least this proportion of the ID data as ID, while keeping the false positive rate as low as possible. The default true positive goal is 0.95.

discriminatorOnlyID = networkDistributionDiscriminator(net,XID,[],"baseline");
discriminatorTPG = networkDistributionDiscriminator(net,XID,XOOD,"baseline",TruePositiveGoal=0.95);
discriminatorOnlyID.Threshold
discriminatorTPG.Threshold

If you provide only OOD data to the networkDistributionDiscriminator function, or if you set the FalsePositiveGoal name-value argument in the input, then the function returns the threshold that incorrectly classifies at most this proportion of the OOD data as ID, while keeping the true positive rate as high as possible. The default false positive goal is 0.05.

discriminatorOnlyOOD = networkDistributionDiscriminator(net,[],XOOD,"baseline");
discriminatorFPG = networkDistributionDiscriminator(net,XID,XOOD,"baseline",FalsePositiveGoal=0.05);
discriminatorOnlyOOD.Threshold
discriminatorFPG.Threshold

If you provide both ID and OOD data and do not specify a true or false positive goal, then networkDistributionDiscriminator maximizes the TPR while minimizing the FPR by maximizing the following expression: $\frac{1}{2} (T P R + (1 - F P R))$ .

discriminatorIDAndOOD = networkDistributionDiscriminator(net,XID,XOOD,"baseline");
discriminatorIDAndOOD.Threshold

This figure illustrates the different thresholds that the discriminator chooses if you optimize over both the true positive rate and false positive rate, just the true positive rate, or just the false positive rate.

Figure shows three histograms comparing distribution confidence scores and thresholds for in-distribution and out-of-distribution data, depending on the way in which the threshold was calculated.

Other Techniques

You can use other techniques to verify the behavior of your neural network in MATLAB beyond the robustness and OOD detection methods included in Deep Learning Toolbox Verification Library.

Interpretability and Visualization

One way of verifying the behavior of your network is to understand its decision-making process by using interpretability techniques.

Deep Learning Toolbox Verification Library includes the drise function. Use this function to calculate the saliency map to explain the predictions of an object detection network by using the detector randomized input sampling for explanation (D-RISE) algorithm.

For more information about interpreting machine learning models, see Interpret Machine Learning Models (Statistics and Machine Learning Toolbox).

For an overview of deep learning visualization methods in particular, see Deep Learning Visualization Methods

Anomaly Detection

OOD detection typically takes advantage of the features of your trained neural network to determine whether input data is ID or OOD. You can also use statistical anomaly detection techniques directly on your input data to determine whether your data is significantly dissimilar to your training data. For more information on anomaly detection in MATLAB, see Anomaly Detection.

You can also use the HBOS distribution detector to determine whether your data is ID or OOD based only on the data. To do this, set the LayerNames name-value argument in the networkDistributionDiscriminator function to the input layer of your network.

References

[1] Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. “Explaining and Harnessing Adversarial Examples.” Preprint, submitted March 20, 2015. https://doi.org/10.48550/arXiv.1412.6572.

[2] Singh, Gagandeep, Timon Gehr, Markus Püschel, and Martin Vechev. “An Abstract Domain for Certifying Neural Networks”. Proceedings of the ACM on Programming Languages 3, no. POPL (January 2, 2019): 1–30. https://doi.org/10.1145/3290354.