deep.gpu.fastAttentionAlgorithms

Disable fast attention algorithms used by deep learning operations on the GPU

Since R2026a

Syntax

previousState = deep.gpu.fastAttentionAlgorithms(newState)

state = deep.gpu.fastAttentionAlgorithms

Description

previousState = deep.gpu.fastAttentionAlgorithms(newState) returns the current state of the GPU fast attention algorithms option as 1 (true) or 0 (false) before changing the state according to the input newState. The default is 1 (true). This function requires Parallel Computing Toolbox™.

If newState is 1 (true), then subsequent calls to GPU deep learning attention operations use algorithms optimized for performance. These algorithms achieve improved performance by using reduced-precision arithmetic, that is, arithmetic that uses fewer bits than single-precision arithmetic. If newState is 0 (false), then subsequent calls to GPU deep learning attention operations use higher-precision algorithms at the cost of performance.

example

state = deep.gpu.fastAttentionAlgorithms returns the current state of GPU fast attention algorithms option as 1 (true) or 0 (false).

Tip

Use this function if your training loss is NaN and normalizing your training data does not resolve the issue. For more information about normalizing training data, see Normalize Sequence Data.

Examples

collapse all

Improve Numerical Stability of GPU Attention Algorithms

This example uses:

Open Live Script

Specify the sizes of the queries, keys, and values.

querySize = 120;
valueSize = 120;
numQueries = 100;
numValues = 80;
numObservations = 64;

Create random, single-precision gpuArray data containing the queries, keys, and values. For the queries, specify the dlarray format "CBT" (channel, batch, time).

queries = dlarray(rand(querySize,numObservations,numQueries,"single","gpuArray"),"CBT");
keys = dlarray(rand(querySize,numObservations,numValues,"single","gpuArray"));
values = dlarray(rand(valueSize,numObservations,numValues,"single","gpuArray"));

Specify the number of attention heads.

numHeads = 5;

To simulate data with significant outliers, scale up 100 random queries so that the query data has a large dynamic range.

idx = randperm(numel(queries),100);
queries(idx) = 1e5*queries(idx);

Inspect the smallest and largest query.

[minQuery,maxQuery] = bounds(queries,"all")

minQuery = 
  1(C) × 1(B) × 1(T) single gpuArray dlarray

  2.2615e-06

maxQuery = 
  1(C) × 1(B) × 1(T) single gpuArray dlarray

  9.8323e+04

Apply the attention operation.

Y = attention(queries,keys,values,numHeads);

Count the number of NaN values in the output. Due to the large dynamic range of the input data and the fast attention algorithm using reduced-precision arithmetic, most of the output values are NaN.

sum(isnan(Y),"all")

ans = 
  1(C) × 1(B) × 1(T) gpuArray dlarray

        1008

Disable fast attention algorithms and store its previous state, then apply the attention operation again. As the attention operation now uses higher-precision arithmetic, it will run slower than before.

previousState = deep.gpu.fastAttentionAlgorithms(false);
Y = attention(queries,keys,values,numHeads);

Count the number of NaN values in the output. As the attention algorithm now uses single-precision arithmetic, there are no NaN values in the output.

any(isnan(Y),"all")

ans = 
  1(C) × 1(B) × 1(T) logical gpuArray dlarray

   0

Restore the fast attention algorithms option to its original state.

deep.gpu.fastAttentionAlgorithms(previousState);

Input Arguments

collapse all

`newState` — New state of GPU fast attention algorithms option
`1` (`true`) (default) | `0` (`false`)

New state of GPU fast attention algorithms option, specified as one of the following:

1 (true) — Subsequent calls to GPU deep learning attention operations use algorithms optimized for performance. These algorithms achieve improved performance by using reduced-precision arithmetic, that is, arithmetic that uses fewer bits than single-precision arithmetic.
0 (false) — Subsequent calls to GPU deep learning attention operations use higher-precision algorithms at the cost of performance.

The layers and functions that use GPU fast attention algorithms by default are:

Data Types: logical

Version History

Introduced in R2026a

deep.gpu.fastAttentionAlgorithms

Syntax

Description

Examples

Improve Numerical Stability of GPU Attention Algorithms

Input Arguments

`newState` — New state of GPU fast attention algorithms option
`1` (`true`) (default) | `0` (`false`)

Version History

See Also

Topics

deep.gpu.fastAttentionAlgorithms

Syntax

Description

Examples

Improve Numerical Stability of GPU Attention Algorithms

Input Arguments

newState — New state of GPU fast attention algorithms option 1 (true) (default) | 0 (false)

Version History

See Also

Topics

`newState` — New state of GPU fast attention algorithms option
`1` (`true`) (default) | `0` (`false`)