Training NARX open loop but testing closed loop
4 次查看(过去 30 天)
显示 更早的评论
Hello everyone,
I have an electricity load time series including trend and two components of seasonality. I want to train my network with an open loop structure but then want to test it with closed loop because I need to forecast next 1,5 years after training the network.
Therefore I updated the narx toolbox script, but I cannot be sure if it is correct. You will see that i defined a new network called netc and recorded its outputs in outputs, so examine the forecast performance with outputsc.
I would be glad if you can take a look to see it is correct.
Thanks in advance..
% Solve an Autoregression Problem with External Input with a NARX Neural Network
% Script generated by NTSTOOL
% Created Sun Feb 03 20:02:41 CET 2013
%
% This script assumes these variables are defined:
%
% data - input time series.
% VALUE - feedback time series.
inputSeries = tonndata(data,false,false);
targetSeries = tonndata(VALUE,false,false);
% Create a Nonlinear Autoregressive Network with External Input
inputDelays = 1:1;
feedbackDelays = 1:24;
hiddenLayerSize = [10];
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
% Prepare the Data for Training and Simulation
% The function PREPARETS prepares timeseries data for a particular network,
% shifting time by the minimum amount to fill input states and layer states.
% Using PREPARETS allows you to keep your original time series data unchanged, while
% easily customizing it for networks with differing numbers of delays, with
% open loop or closed loop feedback modes.
[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
% Setup Division of Data for Training, Validation, Testing
net.divideFcn = 'divideind';
%net.divideMode = 'sample';
net.divideParam.trainInd = 1:35104;
net.divideParam.valInd = 35105:39104;
%net.divideParam.testInd = 39105:43104;
%[inputs,valInd,testInd] = divideind(2000,1:1000,1001:1500,1501:2000);
% Train the Network
[net,tr] = train(net,inputs,targets,inputStates,layerStates);
% Test the Network
netc = closeloop(net);
y1=targetSeries(39105:43104);
u1=inputSeries(39105:43104);
[p1,Pi1,Ai1,t1] = preparets(netc,u1,{},y1);
outputsc = netc(p1,Pi1,Ai1);
errors = gsubtract(t1,outputsc);
performance = perform(netc,t1,outputsc)
% View the Network
view(netc)
0 个评论
采纳的回答
Greg Heath
2013-3-2
% So below you can see how my final code looks like for at least the open loop part still this part has two problems.
% 2.1. It says that valRatio is not a legal parameter, not like an error but appears on command window.
Apparently, it is a bug; To get around this
a. COULD TRY Nval = 1 and net.trainParam.max_fail = 1000.
OR
b. Treat training with 'dividetrain' and testing as separate problems
% 2.2. despite i put net.performFcn='mae' it doesnt set it and when i enter netperformFcn= it still says MSE.
Could be another bug. Or maybe NARX only allows MSE . For the time
being, just use the default MSE.
% 2.3. and one additional question, do i need to normalize the data before feeding it? In my previous studies with NNs I always did but when I searched for NARX I didn't see anything mentioning it so thought maybe preparets does it for me, does it?
No.
TRAIN uses the common default MAPMINMAX. Instead of overriding it
with MAPSTD, I, typically, recommend just using ZSCORE in the pre- training preparations (plotting, detection of outliers and
significant lag values; constant and linear models) before considering
a neural model. Target standardization automatically normalizes MSE so
that R2a = 1-MSEa.
%Solve an Autoregression Problem with External Input with a NARX Neural Network Script generated by NTSTOOL Created Sun Feb 03 20:02:41 CET 2013 This script assumes these variables are defined: data - input time series. VALUE - feedback time series.
inputSeries = tonndata(data,false,false);
targetSeries = tonndata(VALUE,false,false);
whos
VERY IMPORTANT to verify dimensions (I,O,N) and class(cell or double)
% Create a Nonlinear Autoregressive Network with External Input
inputDelays = 0;
WHY? IF YOU ARE GOING TO HAVE A BUFFER FOR DELAYS
ALWAYS CHECK TO SEE IF THERE ARE SIGNIFICANT I/O CROSS
CORRELATIONS THAT CAN BE USED.
feedbackDelays = [1:2 23:25 168:169];
WHY HAVE A SIZE 169 BUFFER FOR ONLY 7 DELAYS?
Hub = ?
hiddenLayerSize = [10];
Ndof = Ntrneq - Nw % DOF CHECK:
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize)
net.performFcn = 'mae';
net.trainParam.valRatio = 0;
USE MSE DEFAULT
valRatio = 0 NOT ALLOWED (BUG)
[inputs,inputStates,layerStates,targets] = preparets(net,inputSeries,{},targetSeries);
whos inputSeries targetSeries inputs inputStates layerStates targets
ALWAYS CHECK SIZE AND CLASS AFTER PREPARETS
%Setup Division of Data for Training, Validation, Testing
---SNIP
net.divideParam.valInd = 20950:24950; % Nval = 4001
NOT CONSISTENT WITH VAL RATIO = 0
net.divideParam.testInd = 24951:29927; % Ntst = 5027
ARE YOU SURE ALL OF THIS DATA IS STATIONARY (I.E., CONSTANT
SLIDING WINDOW SUMMARY STATISTICS)?
-----SNIP
testY = cell2mat(targetSeries(testInd));
USE 'Y' FOR OUTPUTS, NOT TARGETS
% Train the Network
-----SNIP
%Collect Statistics
WHY IGNORE THE INFO IN TR??
-----SNIP ) errpct_train = abs(err_train)./trainY*100;
% WHAT IF TRAINY = 0 SOMEWHERE?
I IGNORED THE REST
GREG
2 个评论
Greg Heath
2013-3-5
% Commented on by esra on 4 Mar 2013 about 11:15
% 2.3. and one additional question, do i need to normalize the data before feeding it? In my previous studies with NNs I always did but when I searched for NARX I didn't see anything mentioning it so thought maybe preparets does it for me, does it?
No. TRAIN uses the common default MAPMINMAX. Instead of overriding it with MAPSTD, I, typically, recommend just using ZSCORE in the pre- training preparations (plotting, detection of outliers and significant lag values; constant and linear models) before considering a neural model. Target standardization automatically normalizes MSE so that R2a = 1-MSEa.
% -----I have checked the outliers, plots, lags. So do you mean I should still use zscore function and standardize data before nuilding the model or not?
What is minmax(input) and minmax(output) ?
Your choice.
If it really concerns you, try both on a test example from nndatsets or a subset of your own data and see which one suits you.
% Create a Nonlinear Autoregressive Network with External Input inputDelays = 0; WHY? IF YOU ARE GOING TO HAVE A BUFFER FOR DELAYS ALWAYS CHECK TO SEE IF THERE ARE SIGNIFICANT I/O CROSS CORRELATIONS THAT CAN BE USED.
% ----I have mentioned this before, it is due to the model I want to build. I do not want delay on input variables but just want inputs of that time interval.
% feedbackDelays = [1:2 23:25 168:169]
WHY HAVE A SIZE 169 BUFFER FOR ONLY 7 DELAYS?
% --- Because significant lag is at 168 and 169 and these are very important lags . Is there a way of putting these lags without buffer? And I have pretty huge dataset, 169 buffer should not be a big problem, is that right?
That is not a problem. However, if you don't have a physical reason for why that lag is significant I would question it: e.g., show us your code for determining the autocorrelation function and significant lags
--- Hub= 1047, how can I find what each of these values mean, cn you suggest me a link? I calculated this value referring to one of the answers you gave before.
Ntrneq = Ntrn*O % No. of training equations
Nw = ( I+max(ID,FD)+1)*H+(H+1)*O % No. of unknowns to estimate
Ndof = Ntrneq-Nw % No. of estimation degrees of freedom
Hub = largest integer value of H for which Ntrneq > Nw
Ndof < 0 ==> Nw > Ntrneq ==> OVERFITTING (More unknown weights than equations)
If an iterative solution attempts to achieve a zero error training solution in the presence of noise, measurement error and/or numerical errors, the net may get OVERTRAINED and not be able to generalize to nontraining data. Various remedies exist. Some are
1. Use large enough Ntrn and/or small enough H so that Ntrneq >> Nw
(don't overfit)
2.Train to a practical nonzero training error rate (don't overtrain
an overfit net)
a. Validation stopping
b. Regularization (help msereg and help trainbr)
c. MSEtrngoal ~ 0.01*SSE00/Ndof (R2a ~ 0.99)
% hiddenLayerSize = [10]; % Ndof = Ntrneq - Nw % DOF CHECK: % --- Ndof =20678 and Ntreneq= 20949 do you think it is sign of overfitting? I checked R statistics there is no sign of overfit, all R values are high no decrease in valid and test.
Since Ntrneq >> Nw, do not worry.
更多回答(4 个)
Greg Heath
2013-2-17
1. The choice of delays is strange. Can you explain? At what lags are the significant auto and cross correlations?
2. You have Nw = (1+24+1)*10+(10+1)*1 = 271 weights. Why do you need more than ~ 30*271 ~ 8000 training vectors? Isn't the series stationary?
3. Shouldn't the data division preceed preparets? Or doesn't it matter?
4. Delete the following? Also note the erroneous first output
[inputs,valInd,testInd] = divideind(2000,1:1000,1001:1500,1501:2000);
5. You have so much data do you really need a validation set for early stopping?
6. Use tr to obtain the trn/val/tst MSEs and other stats
7. Evaluate both open loop and closed loop performances on ALL of the data
8. Normalize the 3 MSEs: NMSEi = MSEi/mean(var(ti',1)), i = trn,val,tst
9. Obtain the 3 coefficients of determination (see wikipedia)
R2i = 1- NMSEi
Hope this helps.
Thank you for formally accepting my answer.
Greg
Greg Heath
2013-2-20
% 1. The data is an hourly data so significant correlation is in 24 hours, for the exogenous variables I just want to infeed external inputs of that hour but use feedback loop for last 24 hours.
So you can state, unequivocally, that there are no significant crosscorrelations except at lag = 1? That seems suspicious.
and
You don't care to know which of the 24 feedback lags are/are-not significant?
and
Are you sure "of that hour" doesn't mean ID=0 instead of ID = 1?
% 2. I chose the network design by trial and error. 10 hidden neurons was the best performing architecture, isn't it what I should do?
Yes. I was wondering if you just blindly accepted the default value.
Over what range of values did you search?
How many trial random initial weight designs for each value?
But my question 2 asked why you need more than ~ 8000 vectors if you only have 271 weights?
% 3. I followed the original script for data division, just modified it not to randomly divide. I had observed the graphs it doesn't look like data division matters.
Graphs of what? Output overlayed on Target?
That may be caused by using ALL of the previous day's measurements. AND you have a humongus amount of training data.
% 4. What is different does it provide me compared to the version I wrote?
Which vector split do you want, the 43104 or the 2000? The 2000 vector split statement does absolutely nothing because it is not recognized by the net. In addition, the first output is mislabeled as "inputs" instead of trainInd.
Efficiency. Why have more weights than you need? Using ineffective lag inputs could really hurt if you didn't have a sufficiently large amount of training data.
%5. Sorry I didn't get this one, as I said I just updated the script given by Matlab. Where do I set validation for early stopping?
It is set by default, to take 15% of the data for validation to prevent overtraining when the net is overfit , i.e., Nw = net.numWeightElements is not sufficiently lower than the number of training equations Ntrn*O.
When Ntrn*O is sufficently large, validation stopping is unnecessary.
If you want to remove it use
net.trainParam.valRatio = 0;
% 6. I collect the stats in another sheet on excel, so I do not really need % tr.
Collect them from where? If you do not get them from tr, you have to calculate them yourself! Why duplicate calculations that have already been done??
Did you look at the results of the comand tr or tr =tr without the ending semicolon?
% 7. I don't want open/closed loop performances on all data. All I want to % do is feeding the network with real feedback values during training but % during testing feeding it with forecasted values (in feedback loop). % Therefore I want to train it as open loop but test it as closed loop. % Because in real life for ex. the analyst will forecast 1 month ahead so % won't have the output values, but as this is a time series data will % feedback the forecasted values.
I'm puzzled. If you design an open loop net, why wouldn't you want to know how it worked on the trn/val/tst data you just used?
and
If you close the loop, why wouldn't you want to know how that works on the old data before considering the new data?
Or am I missing something?
% 8/9. I don't use MSE values provided.
Well, what is your figure of merit for the open loop and closed loop designs?
.... How do you know you have succeeded?
Greg Heath
2013-2-21
%% Training NARX open loop but testing closed loop %% Comment by esra on 20 Feb 2013 at 10:47
% 1. I checked the autocorrelation values, significant correlation is at % 1st, 2nd and 24th hours (the highest, also at 3rd, 23rd and 25th are % highly correlated) What I don't understand about NARX is for example 10th % hour is not significantly correlated, but what I understand from % definition of NARX is if I want to include 24th lag i have to include all % lags smaller or equal to 24, is that right? I am doing it wrong I guess. % Can you correct me?
Unfortunately, if you use the 24th lag, preparets will put all 24 lags in the delay buffer. If all are not significant, using the not significant lags is tantamount to putting in noise for those lags. Also, you are adding weights that do not contribute.
Therefore, use FD = [1:3, 23:24] needing 6 weights instead of 24 weights. Obviously, you will loose 18 vectors up front, ... small potatos for you.
% and for ID, i checked it looks like it should be ID=0, thanks. % % 2. In fact the best architecture is with two hidden layers, 10 neurons % each. I checked for over fitting as well. I tried unto 2 hidden layers % with 2, 5 and 10 neurons and combinations of them. I didn't test for % different initial weights, I know normally I would need is, but what I % was told by my supervisor is as I have too many training data points, % initial weights wouldn't effect the performance that much. Would you % agree?
I agree that you have more data than needed for training. Ntrneq ~ 30*Nw should be sufficient. However, I still would still take a look at least Ntrials = 3.
% and about the many vectors despite less weights, is there any harm having % that many vectors? I thought training more is better for Neural Network. % Or do you mean something else by saying vectors?
Again, Ntrneq ~ 30*Nw is usually sufficent. If you wish, you could design several nets on different timewindows and either take the one that works best on all of the data or average the outputs.
% 3. You said data division should proceed preparets at first post, right? % What I mean is data division is after preparets in original script and I % thought it should be. One would first want to prepare the data before % dividing it into sets isn't is right?
Yes. You are correct.
% About mislabeling I don't know how to fix it, would it work if I start % trainInd at 2 instead of 1:35104?
I was referring to the error
[ input,valInd,testInd] = ...
instead of
[trainInd,valInd,testInd] = ...
% For efficiency, I know, I agree. But as I said in the first question, I % thought I have to put all lags to include 24th.
In the net creation you can specify the 6 lags above. However, the buffer will load 24 but only connect weights to the 6. All you lose is 18 data points of the first 24.
net.numWeightElements is the total weight count. I always check that and use of the command whos on the inputs and outputs of preparets to make sure the bean counting is correct.
% 5. Done, thanks. % % 6. I am ashamed of it but I am not really good at collecting statistics % at Matlab. So all I could find is I can collect MSE and check R values % with tr. But I need MAPE (mean absolute percentage), for me it is % meaningful. So I don't like it, but for each run, I put it on an Excel % sheet, to collect the statistics. I would be grateful if you tell me how % to do it on Matlab. I mean, of course I can code it, but is there a % function for that?
Dunno. Search in the STATS TBX.
% 7. I have trained the dataset with open loop, checked the performances, % decided on an architecture.
Did you separate the trn/val/tst stats?
4 个评论
Greg Heath
2013-3-2
编辑:Greg Heath
2013-3-2
Hard to comment without critical info
1. Any warnings or error messages?
2. testInd typo
3. valRatio = 0 mixup
4. What was tr.stop?
5. What are R2trn, R2trna, R2val and R2tst for open loop?
6. What was R2sim for open loop?
7. What was R2sim for closed loop?
esra
2013-3-1
1 个评论
Greg Heath
2013-3-1
Sorry, for some reason I missed your last 2 comments. Will take a look at them this evening (if I'm still alive after my gym workout)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Modeling and Prediction with NARX and Time-Delay Networks 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!