findAdversarialExamples

Find Adversarial Examples

This example uses:

Load a pretrained network. This network has been trained to classify images of digits.

rng(1)
load("digitsClassificationConvolutionNet.mat","net")
classNames = categorical(0:9);

Load the test dataset, then randomly select a subset of samples to use for generating adversarial examples.

[XTest,TTest] = digitTest4DArrayData; 

numInputs = 10;
testIdx = randi(numel(TTest),numInputs);
imgs = XTest(:,:,:,testIdx);
labels = TTest(testIdx,:);

Prepare the data by converting it to a dlarray object.

X = dlarray(single(imgs),"SSCB");

Find the labels predicted by the network.

scores = predict(net,X);
YTest = scores2label(scores,classNames);

In this example, the values of the pixels are between 0 and 1, so specify a maximum perturbation size of 0.1. Clip the lower and upper bounds so that they remain within the range of the input data.

perturbationSize = 0.1;

XLower = max(X-perturbationSize,0);
XUpper = min(X+perturbationSize,1);

Use the findAdversarialExamples function to find adversarial examples.

[examples,mislabels,iX] = findAdversarialExamples(net,XLower,XUpper,labels);

For the first adversarial example, view the original image and the adversarial example side-by-side. The adversarial example is misclassified even though the adversarial image appears very similar to the original image.

adversarialExampleIndex = 1;
inputIndex = iX(adversarialExampleIndex);

figure
tiledlayout(1,2); 

nexttile(1);
imshow(imgs(:,:,:,inputIndex));
title({"Original Image (Class: " + string(labels(inputIndex)) + ")", ...
    "Predicted Class: " + string(YTest(inputIndex))});

nexttile(2) 
imshow(extractdata(examples(:,:,:,adversarialExampleIndex))); 
title({"Adversarial Example (Class: " + string(labels(inputIndex)) + ")", ...
    "Predicted Class: " + string(mislabels(1))});

Figure contains 2 axes objects. Hidden axes object 1 with title Original Image (Class: 4) Predicted Class: 4 contains an object of type image. Hidden axes object 2 with title Adversarial Example (Class: 4) Predicted Class: 8 contains an object of type image.

Find Targeted Adversarial Examples

This example uses:

Load a pretrained network. This network has been trained to classify waveforms into one of four classes: sawtooth, sine, square, or triangle.

rng("default")
load("trainedWaveformClassificationNetwork.mat","net")

Load a test input.

load("WaveformData");
classNames = unique(labels)

classNames = 4×1 categorical
     Sawtooth 
     Sine 
     Square 
     Triangle

numChannels = size(data{1},2);
testIdx = 1;
input = data{testIdx};
label = labels(testIdx)

label = categorical
     Sine

Prepare the input by converting it to a dlarray object.

X = dlarray(single(input),"TC");

Find the class predicted by the network.

score = predict(net,X);
YTest = scores2label(score,classNames)

YTest = categorical
     Sine

Find adversarial examples. As this data has values in the range [-1,1], specify a maximum perturbation size of 0.3.

perturbationSize = 0.3;

XLower = max(X-perturbationSize,-1);
XUpper = min(X+perturbationSize,1);

To specify additional options, create an adversarialOptions object. Set the step size to 0.1 and the number of iterations to 50.

options = adversarialOptions("bim",StepSize=0.1,NumIterations=50)

options = 
  AdversarialOptionsBIM with properties:

                StepSize: 0.1000
           NumIterations: 50
           MiniBatchSize: 128
    ExecutionEnvironment: 'auto'
                 Verbose: 0

Use the findAdversarialExamples function to find an adversarial example for the test input. If no example is found, the function returns []. Find an adversarial example that misclassifies the input as "Sawtooth".

[example,mislabel] = findAdversarialExamples(net,XLower,XUpper,label, ...
    Algorithm=options,AdversarialLabel=categorical("Sawtooth",string(classNames)));

View the original input and the adversarial example side-by-side.

figure
tiledlayout(1,2);

nexttile(1);
stackedplot(input,DisplayLabels="Channel "+string(1:numChannels))
title({"Original Image (Class: " + string(label) + ")", ...
    "Predicted Class: " + string(YTest)});

nexttile(2)
    stackedplot(extractdata(squeeze(example))',DisplayLabels="Channel "+string(1:numChannels));
title({"Adversarial Example (Class: " + string(label) + ")", ...
    "Predicted Class: " + string(mislabel)});

MATLAB figure

Find Adversarial Examples for a PyTorch Network

This example uses:

Load a pretrained classification network. This network is a PyTorch® model that has been trained to predict the class label of images of handwritten digits.

rng(1)
modelfile = "digitsClassificationConvolutionNet.pt";
numClasses = 10;

Load the test dataset, then randomly select a subset of samples to use for generating adversarial examples.

[XTest,TTest] = digitTest4DArrayData; 
numInputs = 10;
testIdx = randi(numel(TTest),numInputs);
X = XTest(:,:,:,testIdx);
labels = TTest(testIdx,:);

In this example, the values of the pixels are between 0 and 1, so specify a maximum perturbation size of 0.1. Clip the lower and upper bounds so that they remain within the range of the input data.

perturbationSize = 0.1;
XLower = max(X-perturbationSize,0);
XUpper = min(X+perturbationSize,1);

Use the findAdversarialExamples function to generate adversarial examples.

[examples,mislabels,iX] = findAdversarialExamples(modelfile,XLower,XUpper,labels,numClasses, ...
    Algorithm="bim", ...
    InputDataPermutation=[4 3 1 2]);

For the first adversarial example, view the original image and the adversarial example side-by-side.

adversarialExampleIndex = 1;
inputIndex = iX(adversarialExampleIndex);

figure
tiledlayout(1,2);
nexttile(1);
imshow(X(:,:,:,inputIndex));
title("Original Image");

nexttile(2) 
imshow(extractdata(examples(:,:,:,adversarialExampleIndex))); 
title("Adversarial Example");

Find Adversarial Examples for Unnormalized Inputs

This example uses:

Load a pretrained network. This network has been trained to classify natural RGB images.

rng("default")
[net,classNames] = imagePretrainedNetwork;
inputSize = net.Layers(1).InputSize(1:2);

Load a test image and resize it to the expected network input size. This is an image of a golden retriever.

img = imread("sherlock.jpg");
img = imresize(img,inputSize);
X = dlarray(single(img),"SSCB");

label = categorical("golden retriever",classNames);

Find the label predicted by the network.

score = predict(net,X);
YTest = scores2label(score,classNames)

YTest = categorical
     golden retriever

This image has values in the range [0 255]. Generate lower and upper bounds with a maximum perturbation size of ∓10. Ensure that the values do not go below 0 or above 255.

perturbationSize = 10;

XLower = max(X-perturbationSize,0);
XUpper = min(X+perturbationSize,255);

The default step size is suitable for inputs with values between [0,1]. As this input has values with a maximum of 255, create an adversarial options object with a step size of 1 and number of iterations set to 2.

options = adversarialOptions("bim",StepSize=1,NumIterations=2);
[example,mislabel] = findAdversarialExamples(net,XLower,XUpper,label,Algorithm=options);

View the original image and the adversarial example side-by-side. The adversarial example is misclassified even though the adversarial image appears very similar to the original image.

figure
tiledlayout(1,2); 

nexttile(1);
imshow(img);
title({"Original Image (Class: " + string(label) + ")", ...
    "Predicted Class: " + string(YTest)});
nexttile(2) 
imshow(uint8(extractdata(example))); 
title({"Adversarial Example", "Predicted Class: " + string(mislabel)});

Figure contains 2 axes objects. Hidden axes object 1 with title Original Image (Class: golden retriever) Predicted Class: golden retriever contains an object of type image. Hidden axes object 2 with title Adversarial Example Predicted Class: Italian greyhound contains an object of type image.

Input Arguments

`XLower` — Lower bound
formatted `dlarray` object | numeric array

Lower bound of the search space for the adversarial examples, specified as a formatted dlarray object or a numeric array.

If you provide a dlnetwork object as input, XLower must be a formatted dlarray object. For more information about dlarray formats, see the fmt input argument of dlarray.
If you provide an ONNX or PyTorch modelfile as input, XLower must be a numeric array.

The lower and upper bounds, XLower and XUpper, must have the same size and format.

`XUpper` — Upper bound
formatted `dlarray` object | numeric array

Upper bound of the search space for the adversarial examples, specified as a formatted dlarray object or a numeric array.

If you provide a dlnetwork object as input, XUpper must be a formatted dlarray object. For more information about dlarray formats, see the fmt input argument of dlarray.
If you provide an ONNX or PyTorch modelfile as input, XUpper must be a numeric array.

The lower and upper bounds, XLower and XUpper, must have the same size and format.

`label` — True label
numeric array | categorical array

Expected correct label for input data between XLower and XUpper, specified as a numeric vector of class indices or as a categorical array.

The number of elements of label must be equal to the number of observations in XLower and XUpper.

Example: categorical("cat",["cat","dog","bird"])

`dlnetwork` only

`net` — Neural network
initialized `dlnetwork` object

Neural network, specified as an initialized dlnetwork object.

The findAdversarialExamples function does not support networks that have multiple inputs or multiple outputs.

ONNX or PyTorch only

Since R2026a

`modelfile` — ONNX or PyTorch model file name
character vector | string scalar

ONNX or PyTorch model file name specified as a character vector or a string scalar. The modelfile must be a full PyTorch model (saved using torch.save()) or an ONNX model with the .onnx extension.

Note

The Python^® classification network must be saved without a softmaxLayer in the output.

`numClasses` — Number of output classes
numeric integer

Number of output classes specified as a numeric integer. This is the number of output classes in the pretrained ONNX or PyTorch network.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: findAdversarialExamples(net,XUpper,XLower,label,Algorithm="bim") finds untargeted adversarial examples using the basic iterative method (BIM).

All Model Types

`Algorithm` — Algorithm
`"bim"` (default) | `"fgsm"` | `AdversarialOptionsBIM` object | `AdversarialOptionsFGSM` object | `NetworkVerificationOptions` object

Algorithm to find adversarial examples, specified as a character vector or string scalar of a built-in algorithm name or a built-in custom object.

Built-in algorithm name. Specify the algorithm as a string scalar or character vector.
- "bim" — Basic iterative method
- "fgsm" — Fast gradient sign method
Built-in algorithm object. If you need more flexibility, you can use the built-in algorithms objects.
- Create the following algorithms objects using the adversarialOptions function. Only applicable when using dlnetwork as input.
  - AdversarialOptionsBIM — Basic iterative method object
  - AdversarialOptionsFGSM — Fast gradient sign method object
- NetworkVerificationOptions — α-β-CROWN network verification object. Only applicable when using ONNX or PyTorch modelfile as input.

`dlnetwork` only

`AdversarialLabel` — Adversarial Label
numeric array | categorical array

Adversarial label, specified as a numeric vector of class indices or as a categorical array. Use this name-value argument to find targeted adversarial examples.

ONNX or PyTorch only

`InputDataPermutation` — Input dimension ordering
`[]` (default) | numeric row vector

Input dimension ordering specified as a numeric row vector. The ordering is the desired permutation of data in XLower and XUpper from MATLAB to Python dimension ordering. See Input Dimension Ordering for more information.

Example: [4 3 1 2]

`InputDataNumDims` — Number of dimensions in input data
`[]` (default) | positive integer

Number of dimensions in input data specified as a positive integer.

Example: 4

`ExecutionEnvironment` — Hardware resource
`"auto"` (default) | `"cpu"` | `"gpu"`

Hardware resource, specified as one of these values:

"auto" – Use a local GPU if one is available. Otherwise, use the local CPU.
"cpu" – Use the local CPU.
"gpu" – Use the local GPU.

The "gpu" option requires Parallel Computing Toolbox™. To use a GPU for deep learning, you must also have a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If you choose one of these options and Parallel Computing Toolbox or a suitable GPU is not available, then the software returns an error.

For more information on when to use the different execution environments, see Scale Up Deep Learning in Parallel, on GPUs, and in the Cloud.

Dependency

If you specify Algorithm as an AlphaCROWNOptions or a NetworkVerificationOptions object, then the execution environment specified in the options object takes precedence.

Output Arguments

`example` — Adversarial example
`dlarray` | numeric array

Adversarial example, returned as a dlarray object or a numeric array.

If you provide a dlnetwork object as input, the example is a dlarray object.
If you provide an ONNX or PyTorch modelfile as input, the example is a numeric array.

The function finds a candidate adversarial example according to the algorithm described in the Adversarial Examples section. If the generated example is not misclassified as expected, then the function does not return it.

If the function is unable to find an adversarial example, then this does not mean that the network is robust to adversarial attacks. To prove network robustness, use the verifyNetworkRobustness function.

`mislabel` — Predicted class of adversarial example
numeric vector | categorical array

Predicted class of the adversarial example, returned as a numeric vector of class indices or as a categorical array. The datatype of mislabel is equal to the datatype of the label input argument.

`iX` — Input batch index
vector

Input batch index, returned as a vector of integers.

You can generate adversarial examples for several batches of input bounds and labels at once. However, the findAdversarialExamples function does not always find an adversarial example. If the generated example is not misclassified correctly, then the function does not return it. Therefore, the example output batch can be smaller than the input batches.

Use the input batch index vector iX to index into XLower, XUpper, and label. For example, XLower(:,:,:,iX(n)) generates example(:,:,:,n).

`iE` — Example batch index
vector

Example batch index, returned as a vector of integers.

Use the example index vector iE to index into example. For example, example(:,:,:,iE(m)) is generated by XLower(:,:,:,m).

More About

Adversarial Examples

Neural networks can be susceptible to a phenomenon known as adversarial examples, where very small changes to an input can cause the input to be misclassified. These changes are often imperceptible to humans.

The findAdversarialExamples function allows you to create two types of adversarial examples:

Untargeted — Modify an input so that it is misclassified as any incorrect class.
Targeted — Modify an input so that it is misclassified as a specific class.

For both targeted and untargeted adversarial attacks, provide the function with an input range [XLower,XUpper] and a label label such that a perfect network would classify each input X within the input range as label. For targeted attacks, also specify the AdversarialLabel name-value argument.

For example, start with a correctly classified input X and a small margin eps and define XLower = X-eps and XUpper = X+eps. Another option, suitable for image data, is to start from an input image and define the upper and lower bounds in such a way that the center of the image stays constant and the findAdversarialExamples function only changes the background.

The findAdversarialExamples function does not use a deterministic algorithm. If the function is unable to find an adversarial example, then this does not mean that the network is robust to adversarial attacks. To prove network robustness, use the verifyNetworkRobustness function.

Algorithms