While MATLAB does not offer a built-in function exactly like scikit-learn's GroupKFold, you can achieve similar results by manually creating your own group-based cross-validation partitions.
Here is how you can do it:
- Determine the unique groups in your dataset.
- Randomly shuffle these groups and then split them into k folds.
- Assign each data point to a fold based on its group.
% Sample data
data = rand(100, 5); % 100 samples, 5 features
labels = randi([0, 1], 100, 1); % Binary labels
groups = randi([1, 20], 100, 1); % 20 unique groups
% Number of folds
k = 5;
% Get unique groups
uniqueGroups = unique(groups);
% Shuffle groups
shuffledGroups = uniqueGroups(randperm(length(uniqueGroups)));
% Split groups into k folds
folds = cell(k, 1);
foldSize = ceil(length(shuffledGroups) / k);
for i = 1:k
startIdx = (i-1) * foldSize + 1;
endIdx = min(i * foldSize, length(shuffledGroups));
folds{i} = shuffledGroups(startIdx:endIdx);
end
% Create cross-validation partitions
cvIndices = zeros(size(groups));
for i = 1:k
testGroups = folds{i};
testIdx = ismember(groups, testGroups);
cvIndices(testIdx) = i;
end
for i = 1:k
testIdx = (cvIndices == i);
trainIdx = ~testIdx;
trainData = data(trainIdx, :);
trainLabels = labels(trainIdx);
testData = data(testIdx, :);
testLabels = labels(testIdx);
fprintf('Fold %d: Train on %d samples, Test on %d samples\n', i, sum(trainIdx), sum(testIdx));
end
For more information kindly refer following MathWorks documentation.
