Feeds
提问
How do I apply a padding mask(B*T) variable into a training of a self-attention transformer decoder, from a 3D Matrix(B*T*C)
While attempting to train my neural network that uses a self attention layer in its transformer block. I have been struggling to...
7 months 前 | 1 个回答 | 0
