Neural Network Training Implementation

Hi, I have been trying to implement my own version of gradient-descent training. The cost function I use for minimization is negative log likelihood. The datasets I have vary between ~1000 samples to ~5000 samples (both for training and unknown test sets). I have used these datasets for training using NNToolbox. Now, my implementation of the neural network do perform well and I have been able to attain accuracy close to 99%. However, when I try to compare my backpropgated partial derivatives with numerical gradients checking method , the difference is too large to not be suspicious of my implementation. I believe the problem somehow lies with when I update the parameters. I tried updating weights after scanning individual samples (on-line), mini-batch and the whole batch. Also, I believe, the final parameter values are too large.
Below is a piece of code that performs backprop and parameter updates for an epoch and updates after scanning individual example.
if true
lambda = 0; % Regularisation parameter
numbatches = 1;
multiPlier = (1 - (learnRate * lambda/size(ipFeatures,2)));
for l = 1 : numbatches
currentBatch = trainP(:, batchInd( (l-1)*batchSize+1 : l*batchSize ) ); % Would be 1 for this case
batchTargets = trainT(:, batchInd( (l-1)*batchSize+1 : l*batchSize ) ); % Would be 1 for this case
activations = forwardPropagation(currentBatch, model);
deltaErrors = computeDeltaError(activations, batchTargets, model);
for t = 1 : numHiddenLayers
% Compute Partial Derivatives
dW{t} = (deltaErrors{t+1} * activations{t}');
db{t} = sum(deltaErrors{t+1}, 2);
% Update parameters
model.weights{t} = multiPlier * model.weights{t}...
- (learnRate/size(currentBatch,2)).*dW{t};
model.bias{t} = model.bias{t} - (learnRate/size(currentBatch,2)).*db{t};
end
end
end
The value of numbatches decides the batch-mode, on-line mode or mini-batch mode operation of the network. I use dW and db to compare with numerical gradients.
Also, I believe, the final parameter values are too large. E.g., one of the weight matrix that I obtained for the dataset trained with 816 samples and tested on dataset with 725 samples, with the classification accuracy of 98.79% which is good as the test dataset has some noisy labels.
-0.2853 -1.3728 -0.6968 0.4703 -1.2471 2.0104 0.2644 -0.6097 0.7695 0.3747
1.4270 1.2017 0.6934 0.8725 0.4917 -1.0928 -0.3810 0.9145 -1.2533 -0.3824
Few sample values of the backpropogated partial derivatives, numerically computed gradient and their difference.
backPropGrads = 0.000863502714559410 0.0112093229963550 9.74490423775809e-05 0.000175776868318497 0.00845120635863130 -0.00189301667233442 -0.00653141680913231 0.00496566802896389 -0.0205541611216203 -0.000101576463654545
numericalGrads = -0.00246672065599973 0.00105451893203656 -0.000341400989006813 -0.000228545330785424 0.000629285591066675 0.00526790995436510 0.00255049267060270 -0.00283454504222680 0.00643259144054997 -2.32609314448906e-05
GradientDifferene = 0.00333022337055914 0.0101548040643184 0.000438850031384393 0.000404322199103921 0.00782192076756462 0.00716092662669952 0.00908190947973501 0.00780021307119069 0.0269867525621703 7.83155322096546e-05
Can anyone suggest me as to what I am doing wrong here. Am I performing the weight updates properly.
- Nilay

回答(0 个)

类别

帮助中心File Exchange 中查找有关 Deep Learning Toolbox 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by