R2a vs R2 in neural network MSE

Question

Dink 2013-4-16

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/72255-r2a-vs-r2-in-neural-network-mse

I've read quite a few posts regarding adjusted co-efficient of determination (R2a) and using this to derive an mse goal for training purposes. I've applied the posts to a training case below where I'm looking to evaluate varying the hidden nodes in my net. My objective is to find the "best" trained net for each of a given set of starting weights for (H=?) - and then apply this trained net to some held back test data to evaluate generalisation.

does the table below reflect the correct application of adjusted R2?
does reverting to (unadjusted) R2 make any sense when faced with -ive Ndof adjustment?
how do I determine a training goal when Ndof is -ive ? setting msegoal = 0 doesn't seem to be realistic ?
assuming I have static/given starting weights for each of my nets (H=?) is it better to abandon MSE goal - set the epochs high (5000/10000) and use k-fold cross validation to find an optimal MSE(training) vs MSE(validation) ratio and then retrain the net based on that fold to the appropriate epoch? (is this even a valid approach?)

N  I  H  O  Nw  Neq  Ndof  Ndof/Ntrneq
16  5  1  91  200  109  0.545
16  15  1  271  200  -71  -0.355
16  25  1  451  200  -251  -1.255
16  35  1  631  200  -431  -2.155
16  45  1  811  200  -611  -3.055
16  55  1  991  200  -791  -3.955

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Greg Heath 2013-4-17

Hub = -1 + ceil( ( Ntrn*O-O)/ (I +O +1)

= -1 + ceil( 199/18) = 11

MSEgoal = max(0, 0.01*Ndof*mean(var(ttrn'))/Ntrneq)

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Greg Heath 2013-4-17

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/72255-r2a-vs-r2-in-neural-network-mse#answer_82472

编辑：Greg Heath 2013-4-17

在 MATLAB Online 中打开

If Ndof = Ntrneq - Nw < 0, there are more unknowns(weights) than training equations and the net is OVERFIT. The excess degrees of freedom allow the net to become OVERTRAINED by going beyond characterizing the salient features of error free training data to memorize contamination caused by noise, interference, measurement and transcription errors. As a result, the net may not generalize well to non-training data that contain a different combination of random contaminants.

If Ntrneq >> Nw there usually is no problem and training is straightforward with just training data (dividetrain). However, the estimation of generalization error must take into account the loss of degrees of freedom caused by evaluating performance using the same data that created the model. This is mitigated somewhat by adjusting the mean-square-error via MSEtrna = SSEtrn/Ndof instead of MSEtrn = SSEtrn/Ntrneq. Nontraining data does not need this adjustment.

WARNING: As Ndof --> 0, the adjustment begins to become useless and overtraining mitigation must be implemented especially if Ndof <0.

See the comp.ai.neural-nets FAQ Part 3: <ftp://ftp.sas.com/pub/neural/FAQ3.html>.

For example,

 1. Reduce number of weights
    a. Reduce I 
    b. Reduce O 
    c. Reduce H
    d. Weight elimination objective function (CANN-FAQ)
 2. Reduce size of weights
    a. Weight Decay: Minimize MSE + alpha*mse(weights)
    b. Minimize MAE + beta *mae(weights)
    c. Bayesian Regularization (trainbr) 
 3. Validation stopping
 4. Jittering (Add random noise)