Why the results are different by using trainNetwork and custom training loop?

10 次查看(过去 30 天)
I have defined a custom layer, and construct a simple network. when I train the network by using trainNetwork and custom training loop, the results are different. the parameters and data are the same.The codes are follows:
1.this is the network trained by trainNetwork function
clear
clc
rng(0)
%% parameters
nMFs = 32;
init_method = 'linespace';
%% data
dataname = 'house';
load([dataname,'.mat'])
% data = xx;
data=data(all(~isnan(data),2),:); % 去除缺失值
data = removeconstantrows(data')'; % 去除常数特征
%% data process
X=data(:,1:end-1); y=data(:,end); y=y./1e5;%y=y-mean(y);
% X=zscore(X);
X=2*(X-min(X))./(max(X)-min(X))-1;
[N0,M]=size(X);
N=round(N0*.7);
idsTrain=datasample(1:N0,N,'replace',false);
XTrain=X(idsTrain,:); yTrain=y(idsTrain);
XTest=X; XTest(idsTrain,:)=[];%XTest={XTest};
yTest=y; yTest(idsTrain)=[];%yTest={yTest};
%% rule list
nRules = nMFs;
% rule = comb(repmat(1:nMFs,M,1));
% rule = repmat([1:nMFs]',1,M);
%% learnable parameters initial method
switch init_method
% FCM
case 'FCM'
[C0,U] = FuzzyCMeans(XTrain,nRules,[2 100 0.001 0]);
Sigma0=C0;
W0 = randn(nRules,M+1);
for ir=1:nRules
Sigma0(ir,:)=std(XTrain,U(ir,:));
W0(ir,1)=U(ir,:)*yTrain/sum(U(ir,:));
end
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'random'
% random
C0 = randn(nRules,M);
Sigma0 = rand(nRules,M);
W0 = randn(nRules,M+1);
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'linespace'
% linespace
C0=zeros(nMFs,M); Sigma0=C0; W0=zeros(nMFs,M+1);
for m=1:M % Initialization
C0(:,m)=linspace(min(XTrain(:,m)),max(XTrain(:,m)),nMFs);
Sigma0(:,m)=std(XTrain(:,m));
end
Sigma0=ones(nMFs,M);
end
%% layers
layers = [
featureInputLayer(M,'Name','Input','Normalization','none');
TSKlayer1(C0,Sigma0,W0,'TSK1');
regressionLayer];
options = trainingOptions(...
'adam',...
'GradientDecayFactor',0.9,...
'SquaredGradientDecayFactor',0.999,...
'Epsilon',1e-8,...
'MaxEpochs',50,...
'MiniBatchSize',128,...
'InitialLearnRate',0.01,...
'LearnRateSchedule','piecewise',...
'LearnRateDropPeriod',100,...
'LearnRateDropFactor',1,...
'Shuffle','every-epoch',...
'ValidationData',{XTest,yTest},...
'ValidationFrequency',10,...
'ValidationPatience',1000,...
'OutputNetwork','best-validation-loss',...
'L2Regularization',0,...
'ResetInputNormalization',false,...
'GradientThreshold',inf,...
'Plots','training-progress');
%% Train the nn
tic
[net,tinfo] = trainNetwork(XTrain,yTrain,layers,options);
toc
the results: the minimum RMSE is about 0.344
2.this is trained by custom training loop
clear
clc
rng(0)
%% parameters
nMFs = 32;
learnRate = 0.01;
decay = 1;
gradientDecayFactor = 0.9;
squaredGradientDecayFactor = 0.999;
epsilon = 1e-8;
numEpochs = 50;
miniBatchSize = 128;
init_method = 'linespace';
%% data
dataname = 'house';
load([dataname,'.mat'])
data=data(all(~isnan(data),2),:); % 去除缺失值
data = removeconstantrows(data')'; % remove constant features
%% data process
X=data(:,1:end-1); y=data(:,end); y=y./1e5;%y=y-mean(y);
X=2*(X-min(X))./(max(X)-min(X))-1;
[N0,M]=size(X);
N=round(N0*.7);
idsTrain=datasample(1:N0,N,'replace',false);
XTrain=X(idsTrain,:); yTrain=y(idsTrain);
XTest=X; XTest(idsTrain,:)=[];%XTest={XTest};
yTest=y; yTest(idsTrain)=[];%yTest={yTest};
XTest = dlarray(XTest','CB');
yTest = dlarray(yTest','CB');
%% rule list
nRules = nMFs;
% rule = comb(repmat(1:nMFs,M,1));
% rule = repmat([1:nMFs]',1,M); % not used
%% initial method
switch init_method
% FCM
case 'FCM'
[C0,U] = FuzzyCMeans(XTrain,nRules,[2 100 0.001 0]);
Sigma0=C0;
W0 = randn(nRules,M+1);
for ir=1:nRules
Sigma0(ir,:)=std(XTrain,U(ir,:));
W0(ir,1)=U(ir,:)*yTrain/sum(U(ir,:));
end
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'random'
% random
C0 = randn(nRules,M);
Sigma0 = rand(nRules,M);
W0 = randn(nRules,M+1);
Sigma0(Sigma0==0)=mean(Sigma0(:));
case 'linespace'
% linespace
C0=zeros(nMFs,M); Sigma0=C0; W0=zeros(nMFs,M+1);
for m=1:M % Initialization
C0(:,m)=linspace(min(XTrain(:,m)),max(XTrain(:,m)),nMFs);
% Sigma0(:,m)=std(XTrain(:,m));
end
Sigma0=ones(nMFs,M);
end
%% data format
dsXTrain = arrayDatastore(XTrain);
dsyTrain = arrayDatastore(yTrain);
dsTrain = combine(dsXTrain,dsyTrain);
layers = [
featureInputLayer(M,'Name','Input','Normalization','none');
TSKlayer1(C0,Sigma0,W0,'TSK1');
];
lgraph = layerGraph(layers);
dlnet = dlnetwork(lgraph);
plots = "training-progress";
% plots = "nan";
% Train Model
% Train the model using a custom training loop. Initialize the velocity parameter for the SGDM solver.
velocity = [];
% accfun = dlaccelerate(@modelGradients);
% clearCache(accfun)
%% mini batch
mbq = minibatchqueue(dsTrain,...
'MiniBatchSize',miniBatchSize,...
'MiniBatchFcn', @preprocessMiniBatch,...
'MiniBatchFormat',{'CB','CB'});
% Initialize the training progress plot.
if plots == "training-progress"
figure
lineLossTrain = animatedline('Color',[0.85 0.325 0.098]);
lineLossTest = animatedline('Color',[0 0 0]);
ylim([0 inf])
xlabel("Iteration")
ylabel("Loss")
grid on
end
averageGrad = [];
averageSqGrad = [];
iteration = 0;
start = tic;
% Loop over epochs.
for epoch = 1:numEpochs
learnRate = learnRate*decay;
% Shuffle data.
shuffle(mbq)
% Loop over mini-batches.
while hasdata(mbq)
iteration = iteration + 1;
% Read mini-batch of data.
[dlX1,dlY] = next(mbq);
% Evaluate the model gradients, state, and loss using dlfeval and the
% modelGradients function and update the network state.
[gradients,state,loss] = dlfeval(@modelGradients,dlnet,dlX1,dlY);
dlnet.State = state;
% Update the network parameters using the SGDM optimizer.
% [dlnet, velocity] = adamupdate(dlnet, gradients, velocity, learnRate, momentum);
[dlnet,averageGrad,averageSqGrad] = ...
adamupdate(dlnet, gradients, ...
averageGrad, averageSqGrad, iteration, ...
learnRate, gradientDecayFactor, squaredGradientDecayFactor,epsilon);
yPreVal = predict(dlnet,XTest);
% yPreVal(isnan(yPreVal)) = yTest(isnan(yPreVal));
test_error = sqrt(mse(yPreVal,yTest))
if plots == "training-progress"
% Display the training progress.
D = duration(0,0,toc(start),'Format','hh:mm:ss');
%completionPercentage = round(iteration/numIterations*100,0);
title("Epoch: " + epoch + ", Elapsed: " + string(D));
addpoints(lineLossTrain,iteration,double(gather(extractdata(sqrt(loss)))))
addpoints(lineLossTest,iteration,double(extractdata(test_error)))
drawnow limitrate
end
end
end
function [gradients,state,loss] = modelGradients(dlnet,dlX1,Y)
[dlYPred,state] = forward(dlnet,dlX1);
loss = mse(dlYPred,Y);
gradients = dlgradient(loss,dlnet.Learnables);
end
function [X,Y] = preprocessMiniBatch(XCell,YCell)
% Extract feature data from cell and concatenate.
X = cat(1,XCell{:});
X = X';
% Extract label data from cell and concatenate.
Y = cat(2,YCell{:});
end
and the results: the minimum RMSE is about 0.24
Why the results are so different?
the TSK1 layer is my custom layer with backword function.
  4 个评论
Samuel Somuyiwa
Samuel Somuyiwa 2022-3-2
The RMSE in the training plot of trainNetwork does not include the factor of half, whereas in the custom training loop you used sqrt(mse(x,y)), and mse includes the factor of half. l2loss does not include the factor of half, so it should be the right function to use in this case. Did you try sqrt(l2loss(x,y))?.

请先登录,再进行评论。

采纳的回答

Samuel Somuyiwa
Samuel Somuyiwa 2022-3-14
The RMSE in the training plot of trainNetwork does not include the factor of half, whereas in the custom training loop you used sqrt(mse(x,y)), and mse includes the factor of half. l2loss does not include the factor of half, so the right way to compute RMSE in this case is sqrt(l2loss(x,y)).

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Image Data Workflows 的更多信息

产品


版本

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by