Neural Network Training Implementation

Question

Hi, I have been trying to implement my own version of gradient-descent training. The cost function I use for minimization is negative log likelihood. The datasets I have vary between ~1000 samples to ~5000 samples (both for training and unknown test sets). I have used these datasets for training using NNToolbox. Now, my implementation of the neural network do perform well and I have been able to attain accuracy close to 99%. However, when I try to compare my backpropgated partial derivatives with <http://ufldl.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization numerical gradients checking method> , the difference is too large to not be suspicious of my implementation. I believe the problem somehow lies with when I update the parameters. I tried updating weights after scanning individual samples (on-line), mini-batch and the whole batch. Also, I believe, the final parameter values are too large.

Below is a piece of code that performs backprop and parameter updates for an epoch and updates after scanning individual example.

  if true
    lambda = 0; % Regularisation parameter
    numbatches = 1; 
    multiPlier = (1 - (learnRate * lambda/size(ipFeatures,2))); 
    for l = 1 : numbatches
        currentBatch = trainP(:, batchInd( (l-1)*batchSize+1 : l*batchSize ) ); % Would be 1 for this case
        batchTargets = trainT(:, batchInd( (l-1)*batchSize+1 : l*batchSize ) ); % Would be 1 for this case
        
        activations = forwardPropagation(currentBatch, model);
        deltaErrors = computeDeltaError(activations, batchTargets, model);
                
        for t = 1 : numHiddenLayers
            % Compute Partial Derivatives
            dW{t} = (deltaErrors{t+1} * activations{t}'); 
            db{t} = sum(deltaErrors{t+1}, 2);
                       
            % Update parameters
            model.weights{t} = multiPlier * model.weights{t}...
                                     - (learnRate/size(currentBatch,2)).*dW{t};
            model.bias{t} = model.bias{t} - (learnRate/size(currentBatch,2)).*db{t};
            
        end              
    end
  end

The value of numbatches decides the batch-mode, on-line mode or mini-batch mode operation of the network. I use dW and db to compare with numerical gradients.

Also, I believe, the final parameter values are too large. E.g., one of the weight matrix that I obtained for the dataset trained with 816 samples and tested on dataset with 725 samples, with the classification accuracy of 98.79% which is good as the test dataset has some noisy labels. 

  
    -0.2853   -1.3728   -0.6968    0.4703   -1.2471    2.0104    0.2644   -0.6097    0.7695    0.3747
    1.4270    1.2017    0.6934    0.8725    0.4917   -1.0928   -0.3810    0.9145   -1.2533   -0.3824
  

Few sample values of the backpropogated partial derivatives, numerically computed gradient and their difference.


    backPropGrads = 0.000863502714559410	0.0112093229963550	9.74490423775809e-05	0.000175776868318497	0.00845120635863130	-0.00189301667233442	-0.00653141680913231	0.00496566802896389	-0.0205541611216203	-0.000101576463654545
   numericalGrads = -0.00246672065599973	0.00105451893203656	-0.000341400989006813	-0.000228545330785424	0.000629285591066675	0.00526790995436510	0.00255049267060270	-0.00283454504222680	0.00643259144054997	-2.32609314448906e-05
 GradientDifferene = 0.00333022337055914	0.0101548040643184	0.000438850031384393	0.000404322199103921	0.00782192076756462	0.00716092662669952	0.00908190947973501	0.00780021307119069	0.0269867525621703	7.83155322096546e-05

Can anyone suggest me as to what I am doing wrong here. Am I performing the weight updates properly.

- Nilay

Neural Network Training Implementation

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（0 个）

类别

标签

Community Treasure Hunt

Neural Network Training Implementation

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论