Changing the number of heads attribute of an attention layer from the Matlab deep learning toolbox doesn't seem to affect the resulting number of learnable parameters.
The following code will result in 1793 total paramters
selfAttentionLayer(num_heads,num_keys)
fullyConnectedLayer(num_classes)
net = addLayers(net,network_layers);
When changing the number of heads to e.g. 16, the number of learnable paramters doesn't change.
Why is that?
Shouldn't the number of learnable paramters of the attention layer increase proportional to the number of heads?
Any help is highly appreciated!