Does 'dividerand' really destroy time series data autocorrelations?

Question

Calvin 2014-7-6

1
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/140637-does-dividerand-really-destroy-time-series-data-autocorrelations

评论： Greg Heath 2014-11-25

I’ve seen this stated many times regarding training of narx networks and I need to understand more about the basis for the claim. At first the argument made sense so I used ‘divideind’ for narx training. But after further thought and some experimentation I’m not so sure.

When training a narxnet using the ‘dividerand’ data partitioning (net.divideFcn = 'dividerand'), does the Matlab code actually randomly parse the data into separate training, validation & testing datasets for independent narxnet calculations? Or does the Matlab code preserve the time sequence of all the inputs & targets and simply mask the irrelevant data partitions before computing the performance statistics for each partition?

If the latter, then I don’t see how ‘dividerand’ would destroy the serial correlations.

I’ve not seen anything in the NN Toolbox documentation warning users to avoid using ‘dividerand’ for narxnets. If anyone knows the code well or has done some testing to confirm, please advise! I suspect this topic is of interest to many.

Cal

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Greg Heath 2014-10-10

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/140637-does-dividerand-really-destroy-time-series-data-autocorrelations#answer_154633

In general, random divisions cannot maintain the auto and crosscorelation relationships.

Just think about it.

Greg

2 个评论
显示无隐藏无

Calvin 2014-10-10

Greg,

I'd appreciate a more in-depth answer. I have thought about this issue, have done some limited testing, and one of our PhD's has investigated the narxnet code more closely. Preliminary evidence is pointing to my above assertion that Matlab does not perform separate narxnet calculations on each of the data partitions, but rather preserves the time sequence of all the inputs & targets and simply masks the irrelevant data partitions before computing the performance statistics for each partition.

I would like to know if others have done some testing or investigated the narxnet code to confirm or refute this claim.

Cal

Greg Heath 2014-11-25

在 MATLAB Online 中打开

1. You are correct.

2. My ASSUMPTION that the default random data division in timeseries functions results in a DESTROYED ORDERING of points within each of the trn/val/tst subsets is INCORRECT.

3. The only effect is to RANDOMIZE the INCREASED SPACING between data points. For the default 0.7/0.15/0.15 division the average spacings are

 meantrnspacing = 1/0.7   =  1.4286
 meanvalspacing = 1/0.15  =  6.6667
 meantstspacing = 1/0.15  =  6.6667

4. Using the dividerand documentation example, estimates of the summary statictics for spacing are given below

 rng('default')
[trnind,valind,tstind] = dividerand(250,0.7,0.15,0.15);
 difftrn = diff(trnind); 
 diffval = diff(valind); 
 difftst = diff(tstind);
 mindifftrn = min(difftrn)       % 1
 mindiffval = min(diffval)       % 1
 mindifftst = min(difftst)       % 1
 meddifftrn = median(difftrn)    % 1
 meddiffval = median(diffval)    % 6
 meddifftst = median(difftst)    % 4
 meandifftrn = mean(difftrn)     % 1.45
 meandiffval = mean(diffval)     % 6.29
 meandifftst = mean(difftst)     % 6.43
 stddifftrn = std(difftrn)       % 0.76
 stddiffval = std(diffval)       % 4.63
 stddifftst = std(difftst)       % 5.68
 maxdifftrn = max(difftrn)       %  4
 maxdiffval = max(diffval)       %  14
 maxdifftst = max(difftst)       %  19

5. Therefore, the val and tst performances may not be good predictors of performance on unseen data.

6. If the summary statistics of the time series are stationary, DIVIDEBLOCK should be a much better choice.

7. Therefore, when searching for the cause of poor performance, compare the summary statistics (including auto and crosscorrelations) of the trn/val and tst subsets.

Hope this helps.

Greg

请先登录，再进行评论。

Does 'dividerand' really destroy time series data autocorrelations?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论
显示无隐藏无

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

Does 'dividerand' really destroy time series data autocorrelations?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（0 个）

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

2 个评论
显示无隐藏无