Machine Learning: Use cross-validation between time series

4 次查看(过去 30 天)
Hello,
I am working on the following task:
Given: about 30 series of measurements. Each one includes measurements until the break of a system. I have divided the data points of each lifetime in 5 classes (A = data points in the section 0-20% of lifetime, B = 20-40% of lifetime, ... E = 80-100% of lifetime).
Goal: I want to determine in which state a concrete system is.
Solution: I have used the function "fitcauto" to train many classification algorithms and to choose the best one. However, there is a problem: The algorithm uses cross-validation. Thereby, it divides the input data into training and validation data. The problem is, that this division is made measuring series overlapping. This means there are data points of a specific series of measurements in both training data and test data. However, this training task is too easy, because the algorithm just has to interpolate the missing sections. If it sees after training a completely new series, it will perform very badly. The solution I want to try is to do the cross-validation at the level of the measurement series. This means the data points of one series are all either in the training or validation data.
Question: Is this type of cross-validation possible with MATLAB, especially with the "fitcauto"-function? If yes, how? If no, is there an alternative MATLAB function?
  1 个评论
Magsud Hasanov
Magsud Hasanov 2022-7-22
Hi Paul,
I am also working now on time series forecast and I've been looking for matlab cross validation implementation, as well.
Hope we'll find the answer.
All the best,
Magsud

请先登录,再进行评论。

回答(1 个)

Ayush Aniket
Ayush Aniket 2025-6-11
You can use grouped cross-validation using the cvpartition function, which ensures that all data points from a single measurement series remain in either the training or validation set. Refer the following documentation and code snippet below: https://www.mathworks.com/help/stats/cvpartition.html#mw_9d9b6de7-30dc-4a1c-9349-370602efa9f2
% Assume 'SeriesID' is a column indicating the measurement series
K = 10; % Number of folds
seriesGroups = unique(SeriesID); % Unique measurement series
cvp = cvpartition(length(seriesGroups), 'KFold', K); % Grouped cross-validation
% Prepare training and test sets based on grouped partition
for i_fold = 1:K
testSeries = seriesGroups(cvp.test(i_fold)); % Test series
trainSeries = seriesGroups(cvp.training(i_fold)); % Train series
% Select data points belonging to the respective series
trainIdx = ismember(SeriesID, trainSeries);
testIdx = ismember(SeriesID, testSeries);
trainX = X(trainIdx, :);
trainY = Y(trainIdx);
testX = X(testIdx, :);
testY = Y(testIdx);
% Train model using fitcauto
trainedModel = fitcauto(trainX, trainY);
% Evaluate model on test set
predictions = predict(trainedModel, testX);
accuracy(i_fold) = sum(predictions == testY) / length(testY);
end
Additionally, if fitcauto does not support grouped cross-validation directly, you can manually train models using fitcecoc (for multi-class SVM) or fitcensemble (for ensemble learning) while ensuring grouped cross-validation.
trainedModel = fitcecoc(trainX, trainY, 'CVPartition', cvp);

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

产品


版本

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by