Feeds
提问
Why is training loss progressively rises when low, while the validation loss remains lower until a certain plateau?
As of now after training my transformer architecture whose data has an encoding scheme of a hybrid(one-hot/two-hot encoded vecto...
9 months 前 | 0 个回答 | 0