R2a vs R2 in neural network MSE

3 次查看(过去 30 天)
I've read quite a few posts regarding adjusted co-efficient of determination (R2a) and using this to derive an mse goal for training purposes. I've applied the posts to a training case below where I'm looking to evaluate varying the hidden nodes in my net. My objective is to find the "best" trained net for each of a given set of starting weights for (H=?) - and then apply this trained net to some held back test data to evaluate generalisation.
  • does the table below reflect the correct application of adjusted R2?
  • does reverting to (unadjusted) R2 make any sense when faced with -ive Ndof adjustment?
  • how do I determine a training goal when Ndof is -ive ? setting msegoal = 0 doesn't seem to be realistic ?
  • assuming I have static/given starting weights for each of my nets (H=?) is it better to abandon MSE goal - set the epochs high (5000/10000) and use k-fold cross validation to find an optimal MSE(training) vs MSE(validation) ratio and then retrain the net based on that fold to the appropriate epoch? (is this even a valid approach?)
N I H O Nw Neq Ndof Ndof/Ntrneq
200 16 5 1 91 200 109 0.545
200 16 15 1 271 200 -71 -0.355
200 16 25 1 451 200 -251 -1.255
200 16 35 1 631 200 -431 -2.155
200 16 45 1 811 200 -611 -3.055
200 16 55 1 991 200 -791 -3.955
  1 个评论
Greg Heath
Greg Heath 2013-4-17
Hub = -1 + ceil( ( Ntrn*O-O)/ (I +O +1)
= -1 + ceil( 199/18) = 11
MSEgoal = max(0, 0.01*Ndof*mean(var(ttrn'))/Ntrneq)

请先登录,再进行评论。

采纳的回答

Greg Heath
Greg Heath 2013-4-17
编辑:Greg Heath 2013-4-17
If Ndof = Ntrneq - Nw < 0, there are more unknowns(weights) than training equations and the net is OVERFIT. The excess degrees of freedom allow the net to become OVERTRAINED by going beyond characterizing the salient features of error free training data to memorize contamination caused by noise, interference, measurement and transcription errors. As a result, the net may not generalize well to non-training data that contain a different combination of random contaminants.
If Ntrneq >> Nw there usually is no problem and training is straightforward with just training data (dividetrain). However, the estimation of generalization error must take into account the loss of degrees of freedom caused by evaluating performance using the same data that created the model. This is mitigated somewhat by adjusting the mean-square-error via MSEtrna = SSEtrn/Ndof instead of MSEtrn = SSEtrn/Ntrneq. Nontraining data does not need this adjustment.
WARNING: As Ndof --> 0, the adjustment begins to become useless and overtraining mitigation must be implemented especially if Ndof <0.
See the comp.ai.neural-nets FAQ Part 3: <ftp://ftp.sas.com/pub/neural/FAQ3.html>.
For example,
1. Reduce number of weights
a. Reduce I
b. Reduce O
c. Reduce H
d. Weight elimination objective function (CANN-FAQ)
2. Reduce size of weights
a. Weight Decay: Minimize MSE + alpha*mse(weights)
b. Minimize MAE + beta *mae(weights)
c. Bayesian Regularization (trainbr)
3. Validation stopping
4. Jittering (Add random noise)
3 and 2c are NNTBX options.
Hope this helps.
Thank you for formally accepting my answer
Greg

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Sequence and Numeric Feature Data Workflows 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by