How to use Levenberg-Marquardt backprop with GPU?

12 次查看(过去 30 天)
Levenberg-Marquardt backprop train my shallow neural net very efficienetly and gives a very good result. However, it doesn't seem to support GPU training. Is there a way to implement GPU support for Levenberg-Marquardt backprop?
Thanks

回答(1 个)

Joss Knight
Joss Knight 2019-12-15
This isn't supported out of the box yet. You could convert your network to use dlarray and train it with a custom training loop. Then you could write your own LevMarq solver.
  10 个评论
Amanjit Dulai
Amanjit Dulai 2020-1-17
Here is how you might translate the example above to use some of our newer functionality. In the example below, we train the network you described earlier on a simple regression problem. See the function 'iNetworkForward' for the definition of the network:
% Load the data
[X,T] = simplefit_dataset;
inputSize = size(X,1);
outputSize = size(T,1);
% Split the data into test and training data
rng('default');
testFraction = 0.15;
[XTrain, TTrain, XTest, TTest] = ...
iSplitIntoTrainAndTestSets(X, T, testFraction);
% Initialize the weights for the network
layerSizes = [10 20 20];
params = iInitializeWeights(inputSize, layerSizes, outputSize);
% Specify the training options
executionEnvironment = "cpu";
velocity = [];
miniBatchSize = 20;
numEpochs = 1000;
numObservations = size(XTrain,2);
numIterationsPerEpoch = floor(numObservations./miniBatchSize);
% Cast the data to dlarray
XTrain = dlarray(XTrain, 'CB');
TTrain = dlarray(TTrain, 'CB');
if executionEnvironment == "gpu"
XTrain = gpuArray(XTrain);
TTrain = gpuArray(TTrain);
end
% Train the model
for epoch = 1:numEpochs
for iteration = 1:numIterationsPerEpoch
% Get a batch of data.
indices = (iteration-1)*miniBatchSize+1:iteration*miniBatchSize;
XBatch = XTrain(:,indices);
TBatch = TTrain(:,indices);
% Get the loss and gradients
[loss, gradients] = dlfeval( ...
@iNetworkForwardWithLoss, params, XBatch, TBatch );
% Update the network
[params, velocity] = sgdmupdate(params, gradients, velocity);
% Report the loss
fprintf('Loss: %f\n', extractdata(loss));
end
end
% Run the network on test data
XTest = dlarray(XTest, 'CB');
YTest = iNetworkForward(params, XTest);
YTest = extractdata(YTest);
% Plot the ground truth and predicted data
plot([TTest' YTest']);
%% Helper functions
function [XTrain,TTrain,XTest,TTest] = iSplitIntoTrainAndTestSets( ...
X, T, testFraction )
numObservations = size(X,2);
idx = randperm(numObservations);
splitIndex = floor(numObservations*testFraction);
testIdx = idx(1:splitIndex);
trainIdx = idx((splitIndex+1):end);
XTrain = X(:,trainIdx);
TTrain = T(:,trainIdx);
XTest = X(:,testIdx);
TTest = T(:,testIdx);
end
function params = iInitializeWeights(inputSize, layerSizes, outputSize)
params = struct;
params.W1 = dlarray( iGlorot(inputSize, layerSizes(1)) );
params.b1 = dlarray( zeros([layerSizes(1) 1]) );
params.W2 = dlarray( iGlorot(layerSizes(1), layerSizes(2)) );
params.b2 = dlarray( zeros([layerSizes(2) 1]) );
params.W3 = dlarray( iGlorot(layerSizes(2), layerSizes(3)) );
params.b3 = dlarray( zeros([layerSizes(3) 1]) );
params.W4 = dlarray( iGlorot(layerSizes(3), outputSize) );
params.b4 = dlarray( zeros([outputSize 1]) );
end
function weights = iGlorot(fanIn, fanOut)
weights = (2*rand([fanOut fanIn])-1) * sqrt(6/(fanIn+fanOut));
end
function Y = iNetworkForward(params, X)
Z1 = fullyconnect(X, params.W1, params.b1); % 1st fully connected layer
Z1 = sigmoid(Z1); % Logistic sigmoid
Z2 = fullyconnect(Z1, params.W2, params.b2); % 2nd fully connected layer
Z2 = exp(-Z2.^2); % Radial basis function
Z3 = fullyconnect(Z2, params.W3, params.b3); % 3rd fully connected layer
Z3 = sigmoid(Z3); % Logistic sigmoid
Y = fullyconnect(Z3, params.W4, params.b4); % 4th fully connected layer
end
function [loss, dLossdW] = iNetworkForwardWithLoss(weights, X, T)
Y = iNetworkForward(weights, X);
loss = mse(Y, T)/size(T,1);
dLossdW = dlgradient(loss, weights);
end
There are a few differences from doing things this way compared to using 'feedforwardnet':
  • As you mentioned, 'feedforwardnet' is trained with Levenberg Marquardt by default. But the example above is using stochastic gradient descent with momentum, which is simpler.
  • The example above uses the 'Glorot' weight initializer, which is a more modern technique associated with Deep Learning. 'feedforwardnet' uses the Nguyen-Widrow method.
  • The example above does not perform any scaling on the data. 'feedforwardnet' will by default rescale the input and target data to the range -1 to 1. Sometimes this can help training.

请先登录,再进行评论。

产品


版本

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by