主要内容

compressNetworkUsingProjection

Compress neural network using projection

Since R2022b

    Description

    Add-On Required: This feature requires the Deep Learning Toolbox Model Compression Library add-on.

    The compressNetworkUsingProjection function reduces the number of learnable parameters of layers by performing principal component analysis (PCA) of the neuron activations using a data set representative of the training data and then projects the learnable parameters into the subspace that maintains the highest variance in neuron activations. In some cases, this operation is equivalent to replacing layers with networks containing two or more layers with fewer learnable parameters.

    Depending on the network, projection configuration, and code generation libraries used (including library-free code generation), forward passes of a projected deep neural network can be faster when you deploy the network to embedded hardware.

    If you prune or quantize your network, then use compression using projection after pruning and before quantization.

    netProjected = compressNetworkUsingProjection(net,mbq) compresses the dlnetwork object net by replacing layers with projected layers. The function compresses layers by performing principal component analysis (PCA) of the neuron activations using the data in the minibatchqueue object mbq and projects learnable parameters into the subspace that maintains the highest variance in neuron activations.

    example

    netProjected = compressNetworkUsingProjection(net,X1,...,XN) compresses the network using the data in X1,...,XN, where N is the number of network inputs.

    netProjected = compressNetworkUsingProjection(net,npca) compresses the network using the neuronPCA object npca. The PCA step can be computationally intensive. If you expect to compress the same network multiple times (for example, when exploring different levels of compression), then you can perform the PCA step up front using a neuronPCA object.

    [netProjected, info] = compressNetworkUsingProjection(___) also returns the structure info that contains information about the projected layers, the reduction of learnable parameters, and the explained variance achieved during compression.

    [netProjected, info] = compressNetworkUsingProjection(___,Name=Value) specifies additional options using one or more name-value arguments.

    Examples

    collapse all

    Load the pretrained network in dlnetJapaneseVowels and the training data in JapaneseVowelsTrainData.

    load dlnetJapaneseVowels
    load JapaneseVowelsTrainData

    Compress the network. Specify the mini-batch size for the principal component analysis as 16. Specify the input data format as "CTB".

    [netProjected,info] = compressNetworkUsingProjection(net,XTrain,MiniBatchSize=16,InputDataFormats="CTB");
    Compressed network has 85.4% fewer learnable parameters.
    Projection compressed 2 layers: "lstm","fc"
    

    View the network layers.

    netProjected.Layers
    ans = 
      4×1 Layer array with layers:
    
         1   'sequenceinput'   Sequence Input    Sequence input with 12 channels
         2   'lstm'            Projected Layer   Projected LSTM with 100 hidden units
         3   'fc'              Projected Layer   Projected fully connected layer with output size 9
         4   'softmax'         Softmax           Softmax
    

    View the projected LSTM layer.

    netProjected.Layers(2)
    ans = 
      ProjectedLayer with properties:
    
                       Name: 'lstm'
              OriginalClass: 'nnet.cnn.layer.LSTMLayer'
        LearnablesReduction: 0.8610
                  InputSize: 12
                 OutputSize: 100
    
       Hyperparameters
         InputProjectorSize: 7
        OutputProjectorSize: 6
    
       Learnable Parameters
                    Network: [1×1 dlnetwork]
    
       State Parameters
                    Network: [1×1 dlnetwork]
    
       Network Learnable Parameters
         Network/lstm/InputWeights      400×7 dlarray
         Network/lstm/RecurrentWeights  400×6 dlarray
         Network/lstm/Bias              400×1 dlarray
         Network/lstm/InputProjector    12×7  dlarray
         Network/lstm/OutputProjector   100×6 dlarray
    
       Network State Parameters
         Network/lstm/HiddenState  100×1 dlarray
         Network/lstm/CellState    100×1 dlarray
    
      Show all properties
    
    

    Input Arguments

    collapse all

    Neural network, specified as an initialized dlnetwork object.

    Mini-batch queue that outputs data for each input of the network, specified as a minibatchqueue object.

    The PCA step typically works best when using the full training set. However, any dataset that is representative of the training data distribution suffices. The input data must contain two or more observations and sequences must contain two or more time steps.

    Note

    Padding sequences is not recommended as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

    Input data, specified as a formatted or unformatted dlarray object, numeric array, categorical array, datastore, cell array of numeric arrays, or table.

    Since R2026a, the compressNetworkUsingProjection function supports the same input data types as the trainnet function. You can reuse the training data, or a subset of the training data, for compression.

    For more information about dlarray formats, see the fmt input argument of dlarray.

    The PCA step typically works best when using the full training set. However, any dataset that is representative of the training data distribution suffices. The input data must contain two or more observations and sequences must contain two or more time steps.

    Note

    Padding sequences is not recommended as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

    Neuron principal component analysis, specified as a neuronPCA object.

    Name-Value Arguments

    expand all

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: netProjected = compressNetworkUsingProjection(net,mbq,VerbosityLevel="off") compresses the network using projection and disables the command line display.

    Compression

    expand all

    Names of layers to compress, specified as a string array, cell array of character vectors, or a character vector containing a single layer name.

    The software, by default, compresses all the layers in the network that support projection.

    In rare cases, layers can increase in size due to projection. For example, this can happen when:

    • A layer is small.

    • The ExplainedVarianceGoal argument is too large.

    • The layer does not contain much redundant information.

    If you do not specify the LayerNames name-value argument, then the software does not project layers that would increase in size. To ensure that a supported layer is projected, specify the name of that layer using the LayerNames name-value argument.

    The compressNetworkUsingProjection and neuronPCA functions support these layers:

    Note

    Layers that share learnable parameters with other layers through weight tying do not support compression using projection.

    Since R2026a, compressNetworkUsingProjection and neuronPCA support compressing layers contained inside a networkLayer object.

    If you specify LayerNames to be the name of a network layer, then the software compresses every supported layer inside the network layer.

    To compress a specific nested layer inside a network layer, specify the name of the network layer and the name of the nested layer separated by a forward slash "/". For example, the path to a layer named "nestedLayerName" in a network layer named "networkLayerName" is "networkLayerName/nestedLayerName". If there are multiple levels of nested layers, then specify the path using the form "networkLayerName1/.../networkLayerNameN/nestedLayerName".

    Data Types: string | cell

    Target proportion of neuron activation variance explained by the remaining principal components of each projected layer, specified as a value between 0 (maximum compression) and 1 (project layers with minimal compression).

    If you specify the ExplainedVarianceGoal option, then you must not specify the LearnablesReductionGoal option.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Target proportion of total number of network learnables to remove, specified as a nonnegative scalar less than or equal to 1.

    If you specify the LearnablesReductionGoal option, then you must not specify the ExplainedVarianceGoal option. If you do not specify the LearnablesReductionGoal option, then the function compresses the network using the ExplainedVarianceGoal option.

    If LearnablesReductionGoal is greater than the maximum possible reduction in learnables, then the function removes the maximum possible proportion of learnables. Use the neuronPCA function to determine the possible range of reduction in learnables.

    If LearnablesReductionGoal is smaller than the maximum possible reduction in learnables, then the function removes at least the proportion of learnables specified by LearnablesReductionGoal. If removing a greater proportion of learnables does not reduce the explained variance, then the function automatically removes a higher proportion of learnables. For example, if you specify a learnables reduction goal of 0.2, and if the explained variance is the same for learnables reductions between 0.2 and 0.5, then the function removes 50% of learnables.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Verbosity level, specified as one of these values:

    • "summary" — Display a summary of the compression algorithm.

    • "steps" — Display information about the steps of the compression algorithm.

    • "iterations" — Display information about the iterations of the compression algorithm.

    • "off" — Do not display information.

    Since R2023b

    Flag to unpack projected layers, specified as one of these values:

    • 0 (false) — Do not unpack projected layers. The function replaces projectable layers with ProjectedLayer objects.

    • 1 (true) — Unpack projected layers. The function replaces projectable layers with the network that is equivalent to the projection.

    Principal Component Analysis

    expand all

    Since R2026a

    Size of mini-batches to use for principal component analysis, specified as a positive integer. Larger mini-batch sizes require more memory, but can lead to faster analysis.

    If you specify the input data as a mini-batch queue and MiniBatchSize is set to "auto", then the software uses the MiniBatchSize property of the mini-batch queue.

    If you do not specify the input data as a mini-batch queue and MiniBatchSize is set to "auto", then the software uses mini-batch size 128.

    If you specify MiniBatchSize as an integer, then the software uses the specified value as the mini-batch size, even if the input data is specified as a mini-batch queue with a different mini-batch size.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Data Types: string | char | single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Since R2026a

    Hardware resource to use for principal component analysis, specified as one of these values:

    • "auto" — Use a GPU if one is available. Otherwise, use the CPU. If net is a quantized network with the TargetLibrary property set to "none", use the CPU even if a GPU is available.

    • "gpu" — Use the GPU. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information about supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If Parallel Computing Toolbox or a suitable GPU is not available, then the software returns an error.

    • "cpu" — Use the CPU.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Since R2026a

    Option to pad or truncate input sequences to use for principal component analysis, specified as one of these values:

    • "longest" — Pad sequences in each mini-batch to have the same length as the longest sequence. This option does not discard any data, though padding can introduce noise to the neural network.

    • "shortest" — Truncate sequences in each mini-batch to have the same length as the shortest sequence. This option ensures that no padding is added, at the cost of discarding data.

    To learn more about the effect of padding and truncating sequences, see Sequence Padding and Truncation.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Since R2026a

    Direction of padding or truncation to use for principal component analysis, specified as one of these options:

    • "right" — Pad or truncate sequences on the right. The sequences start at the same time step and the software truncates or adds padding to the end of each sequence.

    • "left" — Pad or truncate sequences on the left. The software truncates or adds padding to the start of each sequence so that the sequences end at the same time step.

    Recurrent layers process sequence data one time step at a time, so when the recurrent layer OutputMode property is "last", any padding in the final time steps can negatively influence the layer output. To pad or truncate sequence data on the left, set the SequencePaddingDirection name-value argument to "left".

    For sequence-to-sequence neural networks (when the OutputMode property is "sequence" for each recurrent layer), any padding in the first time steps can negatively influence the predictions for the earlier time steps. To pad or truncate sequence data on the right, set the SequencePaddingDirection name-value argument to "right".

    To learn more about the effects of padding and truncating sequences, see Sequence Padding and Truncation.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Since R2026a

    Value for padding the input sequences to use for principal component analysis, specified as a scalar.

    Do not pad sequences with NaN, because doing so can propagate errors through the neural network.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Since R2026a

    Description of the input data dimensions to use for principal component analysis, specified as a string array, character vector, or cell array of character vectors.

    If InputDataFormats is "auto", then the software uses the formats expected by the network input. Otherwise, the software uses the specified formats for the corresponding network input.

    A data format is a string of characters, where each character describes the type of the corresponding data dimension.

    The characters are:

    • "S" — Spatial

    • "C" — Channel

    • "B" — Batch

    • "T" — Time

    • "U" — Unspecified

    For example, consider an array that represents a batch of sequences where the first, second, and third dimensions correspond to channels, observations, and time steps, respectively. You can describe the data as having the format "CBT" (channel, batch, time).

    You can specify multiple dimensions labeled "S" or "U". You can use the labels "C", "B", and "T" once each, at most. The software ignores singleton trailing "U" dimensions after the second dimension.

    For a neural network with multiple inputs net, specify an array of input data formats, where InputDataFormats(i) corresponds to the input net.InputNames(i).

    For more information, see Deep Learning Data Formats.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Data Types: char | string | cell

    Since R2026a

    Encoding of categorical inputs to use for principal component analysis, specified as one of these values:

    • "integer" — Convert categorical inputs to their integer value. In this case, the network must have one input channel for each of the categorical inputs.

    • "one-hot" — Convert categorical inputs to one-hot encoded vectors. In this case, the network must have numCategories channels for each of the categorical inputs, where numCategories is the number of categories of the corresponding categorical input.

    Note

    If you specify the input data as a neuronPCA object, then this argument has no effect.

    Output Arguments

    collapse all

    Projected network, returned as a dlnetwork object.

    After you compress the network using projection, you can fine-tune the network to help regain predictive accuracy lost by the compression process. For an example, see Compress Neural Network Using Projection.

    Projection information, returned as a structure with these fields:

    • LearnablesReduction — Proportion of total number of network learnables removed

    • ExplainedVariance — Proportion of neuron activation variance explained by principal components

    • LayerNames (since R2023b) — Names of projected layers

    Tips

    • Code generation does not support ProjectedLayer objects. To replace ProjectedLayer objects in a neural network with the equivalent neural network that represents the projection, use the unpackProjectedLayers function or set the UnpackProjectedLayers option of the compressNetworkUsingProjection function to 1 (true).

    • To determine the maximum possible compression, open your network in Deep Network Designer, then click Analyze for Compression.

    Algorithms

    collapse all

    References

    [1] "Compressing Neural Networks Using Network Projection." Accessed July 20, 2023. https://www.mathworks.com/company/technical-articles/compressing-neural-networks-using-network-projection.html.

    Extended Capabilities

    expand all

    Version History

    Introduced in R2022b

    expand all