Backpropagation-gradient descent (not using toolbox)

Question

0 个投票

My neural network is very unstable; not sure if it is working. I need help, but the algorithm looks good to me so I don't even know what particular question to ask.

The problem I am having: error doesn't change much for a few hundred iterations then blows up to NaN.

**Note - Back propagation, gradient descent, linear output, hyperbolic tangent function for the hidden layers

eta = 0.01; % learning rate
n=size(x1,1);
hid_l = [n 12 1]; % number and sizes of layers, first layer is input x
% initialize weights
for idx =  1:size(hid_l,2)-1 % for each layer
    W{idx}(1:hid_l(idx)+1,1:hid_l(idx+1)) = 2*rand(hid_l(idx)+1,hid_l(idx+1))-1; % initialize weights with random variable [-1,1]
end
for iteration = 1
    % train
    for i = 1:size(x1,2);
        % input layer i
          Y{1}(:,1) = x1(:,i); % input
          % hidden layer j
          for idx = 2:size(hid_l,2) % for each hidden layers
              Y{idx-1} = [1;Y{idx-1}]; % bias neuron added
              V{idx-1} = [W{idx-1}]'*Y{idx-1};
              if idx ~= size(hid_l,2)
                  Y{idx} = tansig(V{idx-1});
              else
                  Y{idx} = purelin(V{idx-1});
              end
          end
          % %# reverse mapping from [-1,1] to original data scale
          [~,outMap] = mapminmax(Y{idx}, -1, 1);
          Y{idx} = mapminmax('reverse', Y{idx}, outMap);
          % this is post processing for testing performance
          train_o(i) = Y{idx};
          if train_o(i)>0
              train_o(i) = 1;
          elseif train_o(i)<=0
              train_o(i) = -1;
          end
          % error term {energy function E = .5e^2)
          e(i) = t1(i)-Y{size(hid_l,2)}; % compare output with target
          % back propogation
          for idx = size(hid_l,2):-1:2 % for each ith hidden layer
              % output layer
              if idx == size(hid_l,2)
                  for idx2 = 1:size(Y{idx-1},1) % re-adjust weights going into output
                      g1{1} = e(i)*Y{size(hid_l,2)}; % local gradient
                      dw{idx-1} = eta*g1{1}*Y{idx-1}(idx2,:); % ith weight update
                      W{idx-1}(idx2,:) = W{idx-1}(idx2,:)+dw{idx-1};
                  end
              else
                  % hidden layer
                  for idx2 = 1:size(Y{idx-1},1) % re-adjust hidden layer weights
                      dY_dV = dtansig(V{idx-1},Y{idx}); % deriv. of act. func. for each layer (lose the bias on hidden layer, b/c it is only used for computing bias on output layer)
                      for idx3 = 1:size(W{idx-1},2) %  for each neuron
                          g2{idx2} = dY_dV(idx3+1)*g1{1}*W{idx-1}(idx2,idx3); % local gradient of single neuron
                          dw{idx-1} = eta*g2{idx2}*Y{idx}(idx3+1);
                          W{idx-1}(idx2,idx3) = W{idx-1}(idx2,idx3)+dw{idx-1};
                      end
                  end
              end
          end
          Y{1}(1,:) = []; % remove bias (for next iteration - adds it back)
          Y{2}(1,:) = []; % remove bias (for next iteration - adds it back)  
      end