doubt in the backpropagation algorithm

2 次查看(过去 30 天)
Hi
I'm studying neural networks and i'm doing a NN with 2 hidden layers and one neuron in the output layer
While I was studying and coding my NN, I faced one doubt.
In the backward step, the math behind this is clear:
to the output layer we have:
, where ⊙ is the hadamard product.
and to hidden layers, we have:
my problem is that, when I code this formulas, I need to change the second equation when I will calculate the the gradient of the first hidden layer (the hidden layer next to the input layer) to match the dimensions of the matrix as follow below:
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,'True');
delta_h2 = (w_out'*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,'True');
delta_h1 = (w2*delta_h2')'.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,'True');
%update weights and biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim';
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2'*h1_out_estim)';
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1'*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
% I wrote this code partially in portuguese so let me explain a litlle.
%'delta_saida' is the gradient of the output layer
%delta_h2 is the gradient of the second hidden layer
%delta_h1 is the gradient of the first hidden layer
%w_out,w2 and w1 are the weights of output, second hidden layer and first hidden layer, respectively.
%b_out,b2 and b1 are the biases of output, second hidden layer and first hidden layer, respectively.
% the function selecionar_funcao() is just to calculate the derivative accordingly with the activation function of the layer
%As you can see, I need to change delta_h1 to match the matrix dimensions
It is right to change the formula like i'm doing in my code ? I'm asking it because in my mind, the way that we calculate the gradient of all hidden layers must be the same, but in my case it isn't true. I will share part of my code here to help anyone to see if i'm doing some mistake
%weights and biases initialization
w1 = randn(num_entradas,n_h1)*sqrt(2/num_entradas);
w2 = randn(n_h1,n_h2) *sqrt(2/n_h1);
w_out = randn(n_h2,n_out) *sqrt(2/n_h2);
b1 = randn(1, n_h1) * sqrt(2/num_entradas);
b2 = randn(1, n_h2) * sqrt(2/n_h1);
b_out = randn(1,n_out) * sqrt(2/n_h2);
%backpropagation
for epoch =1:max_epocas
soma_valid = 0;
soma_estim = 0;
%embaralhar os dados
conj_estim = embaralhar(conj_estim);
% conj_valid = embaralhar(conj_valid);
%Validating
for j=1:size(conj_valid,1)
enter_valid = conj_valid(j,2:end);
h1_in_valid = [enter_valid,1]*[w1;b1];
h1_out_valid = selecionar_funcao(h1_in_valid,ativ_h1,sig_a,tanh_a,tanh_b,'False');
h2_in_valid = [h1_out_valid,1]*[w2;b2];
h2_out_valid = selecionar_funcao(h2_in_valid,ativ_h2,sig_a,tanh_a,tanh_b,'False');
saida_in_valid = [h2_out_valid,1]*[w_out;b_out];
saida_out_valid = selecionar_funcao(saida_in_valid,ativ_out,sig_a,tanh_a,tanh_b,'False');
erro_valid = conj_valid(j,1) - saida_out_valid;
soma_valid = soma_valid + (erro_valid^2);
end
erro_atual_valid = (soma_valid/(2*size(conj_valid,1)));
erros_epoca_valid = [erros_epoca_valid;erro_atual_valid];
%trainning
for i =1:size(conj_estim,1)
enter_estim = conj_estim(i,2:end);
h1_in_estim = [enter_estim,1]*[w1;b1];
h1_out_estim = selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,'False');
h2_in_estim = [h1_out_estim,1]*[w2;b2];
h2_out_estim = selecionar_funcao(h2_in_estim,ativ_h2,sig_a,tanh_a,tanh_b,'False');
saida_in_estim = [h2_out_estim,1]*[w_out;b_out];
saida_out_estim = selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,'False');
erro_estim = conj_estim(i,1) - saida_out_estim;
soma_estim = soma_estim + (erro_estim^2);
% Backpropagation
delta_saida = erro_estim.*selecionar_funcao(saida_in_estim,ativ_out,sig_a,tanh_a,tanh_b,'True');
delta_h2 = (w_out'*delta_saida).*selecionar_funcao(h2_in_estim,ativ_out,sig_a,tanh_a,tanh_b,'True');
delta_h1 = (w2*delta_h2')'.*selecionar_funcao(h1_in_estim,ativ_h1,sig_a,tanh_a,tanh_b,'True');
%atualizar pesos e biases
w_out = w_out + learning_rate*delta_saida*h2_out_estim';
b_out = b_out + learning_rate*delta_saida;
w2 = w2 + learning_rate*(delta_h2'*h1_out_estim)';
b2 = b2 + learning_rate*sum(delta_h2);
w1 = w1 + learning_rate*delta_h1'*enter_estim;
b1 = b1 + learning_rate*sum(delta_h2);
end
erro_atual_estim = (soma_estim/(2*size(conj_estim,1)));
erros_epoca_estim = [erros_epoca_estim;erro_atual_estim];
if erros_epoca_estim(epoch) <limiar
break
else
end
end

回答(1 个)

Umar
Umar 2024-8-9
Hi @jvbx,
Based on the code snippet and your explanation, it seems that you are correctly calculating the gradient of the first hidden layer. The dimensions of the matrices involved in the calculation also appear to be consistent.However, it's important to note that the calculation of the gradient in backpropagation can vary depending on the specific architecture and activation functions used in the neural network. Different architectures may require different formulas for calculating the gradients of the hidden layers. So,as long as the dimensions of the matrices are consistent and the formula aligns with the mathematical principles of backpropagation, you should be on the right track.To further validate your implementation, you can compare the results of your network with a known benchmark or test it on different datasets to ensure its accuracy.Remember, the backpropagation algorithm is a complex process, and it's common to encounter challenges and uncertainties along the way. It's important to experiment, iterate, and validate your implementation to ensure the best performance of your neural network.I hope this explanation clarifies your doubts and helps you move forward with your implementation.
If you have any further questions, feel free to ask!

类别

Help CenterFile Exchange 中查找有关 Image Data Workflows 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by