两段我觉得差别很小的代码,结果却差别很大

5 次查看(过去 30 天)
辽辽 程
辽辽 程 2024-6-21
回答: Aneela 2024-7-24
代码一:% 输入输出归一化
[XTrain,input_str] = mapminmax(p) ;
[YTrain,output_str] = mapminmax(q) ;
layers = [ ...
sequenceInputLayer(numFeatures)
bilstmLayer(numHiddenUnits,'OutputMode','sequence')
% lstmLayer(numHiddenUnits)
fullyConnectedLayer(300)
% lstmLayer(numHiddenUnits)
dropoutLayer(0.5)%会使训练结果出现波动
fullyConnectedLayer(numResponses)
];
net = dlnetwork(layers)
% net = gpuArray(net);
% net = dlupdate(@gpuArray,net);
numEpochs = 150;
miniBatchSize = 100;
ds=arrayDatastore([XTrain;YTrain]);
numObservationsTrain = size(YTrain,2);
numIterationsPerEpoch = floor(numObservationsTrain / miniBatchSize);
%adam优化器
averageGrad = [];
averageSqGrad = [];
numIterations = numEpochs * numIterationsPerEpoch;
iteration = 0;
epoch = 0;
mbq = minibatchqueue(ds,...
MiniBatchSize=miniBatchSize,...
MiniBatchFormat=["CBT"]);
monitor = trainingProgressMonitor( ...
Metrics="Loss", ...
Info=["Epoch" "LearnRate"], ...
XLabel="Iteration");
gradThreshold = 1.0;
while epoch < numEpochs && ~monitor.Stop
epoch = epoch + 1;
reset(mbq);
% idx = 1:1:numel(YTrain);
idx = randperm(size(YTrain,2));
XTrain = XTrain(:,idx);%XTrain = XTrain(:,idx);
YTrain = YTrain(:,idx);
k = 0;
% while k < numIterationsPerEpoch && ~monitor.Stop
for i = 1:numel(net.Layers)
if isa(net.Layers(i), 'nnet.internal.cnn.layer.learnable.LearnableParameter')
grad = net.Layers(i).dLdW; % 获取参数的梯度
gradNorm = norm(grad);
if gradNorm > gradThreshold
grad = grad * (gradThreshold / gradNorm); % 对梯度进行修剪
end
net.Layers(i).dLdW = grad; % 更新修剪后的梯度
end
end
while hasdata(mbq) && ~monitor.Stop
k = k + 1;
iteration = iteration + 1;
XY=next(mbq);
X=XY(1:13,:);
Y=XY(14:16,:);
X = dlarray(X, 'CBT');
Y = dlarray(Y, 'CBT');
[loss, gradients] = dlfeval(@modelLoss2, net, X, Y);
[net, averageGrad, averageSqGrad] = adamupdate(net, gradients, averageGrad, averageSqGrad, iteration);%学习率的自适应调整
recordMetrics(monitor, iteration, Loss=loss);
updateInfo(monitor,Epoch=epoch + " of " + numEpochs);
monitor.Progress = 100 * iteration / numIterations;
end
end
代码二:[XTrain,input_str] = mapminmax(p) ;
[YTrain,output_str] = mapminmax(q) ;
layers = [ ...
sequenceInputLayer(numFeatures)
bilstmLayer(numHiddenUnits,'OutputMode','sequence')
fullyConnectedLayer(300)
dropoutLayer(0.5)
fullyConnectedLayer(numResponses)
regressionLayer];
maxEpochs = 150;
miniBatchSize = 200;
options = trainingOptions('adam', ...
'MaxEpochs',maxEpochs, ...
'MiniBatchSize',miniBatchSize, ...
'InitialLearnRate',0.005, ...
'GradientThreshold',1, ...
'Shuffle','never', ...
'Plots','training-progress',...
'Verbose',false);
net = trainNetwork(XTrain,YTrain,layers,options);
我的XTrain\YTrain都一致,不知为何测试集的回归情况,代码一误差很大,代码二误差很小。我的XTrain是13*14000;YTrain是3*14000。

回答(1 个)

Aneela
Aneela 2024-7-24
Hello,
The first code provided is a custom training loop while the second code uses MATLAB’s built-in “trainNetwork” function.
The potential reasons for different performance can be:
  • In the custom training loop, the data is shuffled at the beginning of each epoch whereas in the built-in “trainNetwork” function, “shuffling” is set to never.
  • The mini-batch size in the custom training loop is set to 100 while in the “trainNetwork” it is set to 200.
  • The initial learning rate is set to 0.005 in the “trainNetwork”, it is not explicitly mentioned in the custom training loop.
You can consider the following to reduce the difference in performance:
  • Ensure that data normalization and preprocessing steps are consistent between the two approaches.
  • Use a validation set to monitor performance during training.
  • Keep a track of immediate results such as loss and gradients to understand the training process in both the approaches.

类别

Help CenterFile Exchange 中查找有关 Big Data Processing 的更多信息

产品


版本

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!