两段我觉得差别很小的代码，结果却差别很大

Question

辽辽程 2024-6-21

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2130721-

回答： Aneela 2024-7-24

代码一：% 输入输出归一化

[XTrain,input_str] = mapminmax(p) ;

[YTrain,output_str] = mapminmax(q) ;

layers = [ ...

sequenceInputLayer(numFeatures)

bilstmLayer(numHiddenUnits,'OutputMode','sequence')

% lstmLayer(numHiddenUnits)

fullyConnectedLayer(300)

% lstmLayer(numHiddenUnits)

dropoutLayer(0.5)%会使训练结果出现波动

fullyConnectedLayer(numResponses)

];

net = dlnetwork(layers)

% net = gpuArray(net);

% net = dlupdate(@gpuArray,net);

numEpochs = 150;

miniBatchSize = 100;

ds=arrayDatastore([XTrain;YTrain]);

numObservationsTrain = size(YTrain,2);

numIterationsPerEpoch = floor(numObservationsTrain / miniBatchSize);

%adam优化器

averageGrad = [];

averageSqGrad = [];

numIterations = numEpochs * numIterationsPerEpoch;

iteration = 0;

epoch = 0;

mbq = minibatchqueue(ds,...

MiniBatchSize=miniBatchSize,...

MiniBatchFormat=["CBT"]);

monitor = trainingProgressMonitor( ...

Metrics="Loss", ...

Info=["Epoch" "LearnRate"], ...

XLabel="Iteration");

gradThreshold = 1.0;

while epoch < numEpochs && ~monitor.Stop

epoch = epoch + 1;

reset(mbq);

% idx = 1:1:numel(YTrain);

idx = randperm(size(YTrain,2));

XTrain = XTrain(:,idx);%XTrain = XTrain(:,idx);

YTrain = YTrain(:,idx);

k = 0;

% while k < numIterationsPerEpoch && ~monitor.Stop

for i = 1:numel(net.Layers)

if isa(net.Layers(i), 'nnet.internal.cnn.layer.learnable.LearnableParameter')

grad = net.Layers(i).dLdW; % 获取参数的梯度

gradNorm = norm(grad);

if gradNorm > gradThreshold

grad = grad * (gradThreshold / gradNorm); % 对梯度进行修剪

end

net.Layers(i).dLdW = grad; % 更新修剪后的梯度

end

while hasdata(mbq) && ~monitor.Stop

k = k + 1;

iteration = iteration + 1;

XY=next(mbq);

X=XY(1:13,:);

Y=XY(14:16,:);

X = dlarray(X, 'CBT');

Y = dlarray(Y, 'CBT');

[loss, gradients] = dlfeval(@modelLoss2, net, X, Y);

[net, averageGrad, averageSqGrad] = adamupdate(net, gradients, averageGrad, averageSqGrad, iteration);%学习率的自适应调整

recordMetrics(monitor, iteration, Loss=loss);

updateInfo(monitor,Epoch=epoch + " of " + numEpochs);

monitor.Progress = 100 * iteration / numIterations;

end

代码二：[XTrain,input_str] = mapminmax(p) ;

[YTrain,output_str] = mapminmax(q) ;

layers = [ ...

sequenceInputLayer(numFeatures)

bilstmLayer(numHiddenUnits,'OutputMode','sequence')

fullyConnectedLayer(300)

dropoutLayer(0.5)

fullyConnectedLayer(numResponses)

regressionLayer];

maxEpochs = 150;

miniBatchSize = 200;

options = trainingOptions('adam', ...

'MaxEpochs',maxEpochs, ...

'MiniBatchSize',miniBatchSize, ...

'InitialLearnRate',0.005, ...

'GradientThreshold',1, ...

'Shuffle','never', ...

'Plots','training-progress',...

'Verbose',false);

net = trainNetwork(XTrain,YTrain,layers,options);

我的XTrain\YTrain都一致，不知为何测试集的回归情况，代码一误差很大，代码二误差很小。我的XTrain是13*14000；YTrain是3*14000。

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Aneela 2024-7-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2130721-#answer_1489826

Hello,

The first code provided is a custom training loop while the second code uses MATLAB’s built-in “trainNetwork” function.

The potential reasons for different performance can be:

In the custom training loop, the data is shuffled at the beginning of each epoch whereas in the built-in “trainNetwork” function, “shuffling” is set to never.
The mini-batch size in the custom training loop is set to 100 while in the “trainNetwork” it is set to 200.
The initial learning rate is set to 0.005 in the “trainNetwork”, it is not explicitly mentioned in the custom training loop.

You can consider the following to reduce the difference in performance:

Ensure that data normalization and preprocessing steps are consistent between the two approaches.
Use a validation set to monitor performance during training.
Keep a track of immediate results such as loss and gradients to understand the training process in both the approaches.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

两段我觉得差别很小的代码，结果却差别很大

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

两段我觉得差别很小的代码，结果却差别很大

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论