Helping with Matlab error Invalid training data. Predictors must be a cell array of sequences. The data dimension of all sequences must be the same.

Question

Hossam Elshahaby 2018-9-16

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/419268-helping-with-matlab-error-invalid-training-data-predictors-must-be-a-cell-array-of-sequences-the-d

回答： Markus Hohlagschwandtner 2020-12-11

采纳的回答： Conor Daly

在 MATLAB Online 中打开

Hello Mr/Ms,

I have the following code to create a data set from 131 video each has 2000 images where each of them has size 40*200

I want to create RNN and train it so that each video has its label -> ex horizontal , vertical motion etc

I got the error : "Invalid training data. Predictors must be a cell array of sequences. The data dimension of all sequences must be the same." but I believe the XTrain is cell array of sequences and The data dimension of all sequences are the same.

I saw several not answered questions in the community asking the same question . Is it a matlab issue ? if yes, is there a workaround?

newsDatasetPath = fullfile(matlabroot,'NewsDataSetCaptionType');
Labels = dir(newsDatasetPath);
Yactual = strings(131);
X = cell([131 1]) ;
VideoNumber = 0;
% Loop over filenames, inserting the image.
for slice = 3 : length(Labels)
      LabelsPath = fullfile(newsDatasetPath, Labels(slice).name);
      Videos = dir(LabelsPath);
      for VideoIndex = 3 : length(Videos)
          VideosPath = fullfile(LabelsPath, Videos(VideoIndex).name);
          Files = dir(VideosPath);
          VideoNumber = VideoNumber + 1;
          array2d = zeros(8000,2000); 
          for FileIndex = 3 : length(Files)
              filename = fullfile(VideosPath, Files(FileIndex).name);
              thisImage = imread(filename);
              thisImage_Reshapped = reshape(thisImage,[8000,1]);
              % Image is okay.  Insert it.
              array2d( 1:8000,FileIndex-2) = thisImage_Reshapped;
          end
          array2d_ts = timeseries(array2d);
          X{VideoNumber} = array2d_ts;
          Yactual( VideoNumber  ) = categorical(string(Labels(slice).name)); 
      end
end
n = 131;
Order = randperm(n) ;
X_shuffled = X(Order,1);
Yactual_shuffled = Yactual(Order,1);
trainNumFiles = floor(0.7 * n );
XTrain = X_shuffled(1:trainNumFiles,:);
YTrain = Yactual_shuffled(1:trainNumFiles,:);
XTest = X_shuffled(trainNumFiles+1:end,:);
YTest = Yactual_shuffled(trainNumFiles+1:end,:);
% Not: For sequence-to-label classification networks, the output mode of the last LSTM layer must be 'last'.
inputSize = 2000;
numHiddenUnits1 = 125;
numHiddenUnits2 = 100;
numClasses = 4;
maxEpochs = 100;
miniBatchSize = 27;
layers = [ ...
      sequenceInputLayer(inputSize , 'Name','InputLayer' )
      lstmLayer(numHiddenUnits1,'OutputMode','sequence' ,'Name','LSTM_Layer1')
      lstmLayer(numHiddenUnits2,'OutputMode','last' ,'Name','LSTM_Layer2')
      fullyConnectedLayer( numClasses ,'Name','Fully_Connected_Layer')
      softmaxLayer('Name','Softmax_Layer')
      classificationLayer('Name','Classification_Layer')
      ];
options = trainingOptions('sgdm', ...
    'ExecutionEnvironment','auto', ...
    'InitialLearnRate',0.01, ...
    'LearnRateSchedule','piecewise', ...
    'LearnRateDropPeriod',20, ...    
    'MaxEpochs',maxEpochs, ...
    'MiniBatchSize',miniBatchSize, ...
    'SequenceLength','longest', ...
    'Shuffle','never', ...
    'Verbose',0, ...
    'Plots','training-progress');
net = trainNetwork( XTrain , categorical(YTrain) , layers , options);
predictedLabels = classify( net , XTest , ...
    'MiniBatchSize',miniBatchSize  , ...
    'SequenceLength','longest' ) ;
% accuracy = (TP + TN)/(TP + FP + FN + TN) ; the average accuracy is returned 
accuracy = sum(predictedLabels == valLabels)/numel(valLabels)
RNN_Net17b = net;
view(net)
save RNN_LSTM_Net17b  % retrieve using load(net)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Conor Daly 2018-9-19

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/419268-helping-with-matlab-error-invalid-training-data-predictors-must-be-a-cell-array-of-sequences-the-d#answer_337480

在 MATLAB Online 中打开

Thanks for sharing your code. timeseries objects are not accepted into trainNetwork, so you do not need to convert your numeric sequences into timeseries when preparing your data. Allocating array2d into X should create an appropriate cell array for trainNetwork

X{VideoNumber} = array2d;

Looking at your network architecture I noticed another issue. The inputSize argument of sequenceInputLayer should not correspond to the number of time steps in your data. Instead, inputSize is the fixed data dimension of your sequences, so it should be 40*200 = 8000 to fit with your data. Networks with a sequenceInputLayer can accept an arbitrary number of time steps, so if you had a video which had fewer than 2000 frames, the network would still be able to determine a classification for the video.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Hossam Elshahaby 2018-9-20

在 MATLAB Online 中打开

Thank you for answering my question.

"Helping with Matlab error Invalid training data. Predictors must be a cell array of sequences. The data dimension of all sequences must be the same."

New code:

inputSize = 8000;
numHiddenUnits = 125;
numClasses = 4; 
maxEpochs = 300;
miniBatchSize = 15;
layers = [ ...
    sequenceInputLayer(inputSize , 'Name','InputLayer' )
    lstmLayer(numHiddenUnits,'OutputMode','last' ,'Name','LSTM_Layer1')
    fullyConnectedLayer( numClasses ,'Name','Fully_Connected_Layer')
    softmaxLayer('Name','Softmax_Layer')
    classificationLayer('Name','Classification_Layer')
    ];
options = trainingOptions('sgdm', ...
    'ExecutionEnvironment','auto', ...
    'InitialLearnRate',0.000003, ...
    'LearnRateSchedule','piecewise', ...
    'MaxEpochs',maxEpochs, ...
    'MiniBatchSize',miniBatchSize, ...
    'SequenceLength','longest', ...
    'Shuffle','never', ...
    'Verbose',0, ...
    'Plots','training-progress');
net = trainNetwork( XTrain , categorical(YTrain') , layers , options);
predictedLabels = classify( net , XTest , ...
    'MiniBatchSize',miniBatchSize  , ...
    'SequenceLength','longest' ) ;
% accuracy = (TP + TN)/(TP + FP + FN + TN) ; the average accuracy is returned 
accuracy = sum(predictedLabels == categorical(YTest'))/numel(YTest)

I accepted your answer it today. I got accuracy 80% how can I raise it ?

Could you give me some ideas , please?

请先登录，再进行评论。

Answer 2

Markus Hohlagschwandtner 2020-12-11

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/419268-helping-with-matlab-error-invalid-training-data-predictors-must-be-a-cell-array-of-sequences-the-d#answer_572125

The first column of the train data, which is the testcase number, must be numbered consecutively.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Helping with Matlab error Invalid training data. Predictors must be a cell array of sequences. The data dimension of all sequences must be the same.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Helping with Matlab error Invalid training data. Predictors must be a cell array of sequences. The data dimension of all sequences must be the same.

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论