Reducing overfitting in Neural networks

Question

Daniel 2017-6-8

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/343904-reducing-overfitting-in-neural-networks

回答： Greg Heath 2017-6-9

I am using the Matlab neural network toolbox in order to train an ANN. From past experience, implementing cross validation when working with ML algorithms can help reduce the problem of overfitting, as well as allowing use of your entire available dataset without adding bias.

My question is; is there any advantage to implementing k-fold cross validation when using the NN toolbox, or are overfitting and bias mitigated by the implementation already (e.g. in the 'trainbr' mode)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Greg Heath 2017-6-9

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/343904-reducing-overfitting-in-neural-networks#answer_270115

在 MATLAB Online 中打开

K-FOLD CROSS-VALIDATION IS NOT A CURE FOR THE ILLS OF AN OVERFIT NET.

BACKGROUND:

1. OVERFITTING:

   Nw > Ntrneq  
   where
   Nw = number of unknown weights
   Ntrneq = number of training equations

2. GENERALIZATION:

If y0 is a solution to the system of equations f(x0,y) = 0, then the system 
generalizes well if
0 = f(x0 + dx , y)  ==> y = y0 + dy
 for NON-INFINITESMALLY SMALL dx and dy

3. NONUNIQENESS and INSTABILITY

 Solutions to over-fit systems are, typically not unique.  More importantly, 
 the non-uniqueness can lead to the instability and poor generalization of 
iterative solutions.
 Typically, there are an infinite number of solutions to an overfit system 
of equations. However, many of the solutions do not generalize well. For example, 
iterative solutions to an overfit system can lead to solutions that are 
inappropriate. I call this problem

4. OVERTRAINING AN OVERFIT NET

 There are several approaches to avoid overtraining an overfit net:
a. NONOVERFITTING: Do not overfit the net in the first place by using the rule
        Ntrneq >= Nw
b. STOPPED TRAINING: Use train/val/test data division and STOP TRAINING when the 
validation subset error increases, continually, for a prespecified 
(MATLAB default is 6) number of epochs. This technique is used in the 
LEVENBURG-MARQUARDT and CONJUGATE-GRADIENT training functions TRAINLM 
and TRAINCG, respectively.
 c. BAYESIAN REGULARIZATION: Constrain the size of the weights by adding to the 
minimization function a penalty term proportional to the weights squared 
Euclidean norm. Although this technique is the default in the training 
function TRAINBR, it can be specified with other training functions.

5. Perhaps you confused the k-fold CROSS-VALIDATION with DATA DIVISION STOPPED TRAINING as a technique to avoid overtraining an overfit net.

It is not. See below.

6. K-FOLD CROSSVALIDATION

 a. This widely known technique is not offered in the MATLAB NN TOOLBOX
 b. Nontheless, my use of the CROSSVAL and CVPARTITION functions from 
if true
  % code
endother toolboxes 
can be found in both the NEWSGROUP and ANSWERS by including "greg" as a 
searchword with cross validation, cross-validation and crossvalidation

7. However, instead of using other toolboxes to implement k-fold crossvalidation, I compensate

by using m multiple designs (typically 10 <= m <= 30) that only differ by a random division of

training, validation and test subsets in addition to the default random selection of initial weights.

8. My technique is trivial to implement:

   Given an I-H-O net topology
    a.  Initialize the random number generator so that designs can be duplicated.
    b. Store the current state of the RNG at the beginning of the loop so that any 
design can be recreated at a later date without regenerating the others.
    c. Design a net and store the performance results (e.g., Normalized Mean 
Square Error NMSE). Storing the net is not necessary since it is easily redesigned 
given the stored state of the RNG.