How to partition data in cells for validation in machine learning model?
3 次查看(过去 30 天)
显示 更早的评论
Hello there , I have training data for 4 trials stores in a 4x1 cell named "trainingdataX" and "trainingdataY" as whoen here and I am trying to pull out 15 percent of all this data for validation purposes and store it in variables "Xval" and "Yval". How would I be able to do this if the data is stored in a cells corresponding to the trials and ensure the corresponding value is partioned out for validation too? Any help is greatly appreciated!
%Exclude Data for Val
rng('default')
n = %im not sure what to put here to have it pull data from each of the 4 trials
partition = cvpartition(n,'Holdout',0.15);
idxTrain = training(partition);
FinalTrainX = trainingdataX(idxTrain,:)
FinalTrainY = trainingdataY(idxTrain,:)
idxNew = test(partition);
Xval = trainingdataX(idxNew,:)
Yval = trainingdataY(idxNew,:)
0 个评论
回答(2 个)
YERRAMADAS
2024-8-1
Use the cross-validation method to maximize the data available for each of these sets
Aditya
2024-8-1
To partition data stored in cells for validation, you need to first concatenate the data from all trials into single matrices. After partitioning, you can then split the data back into the training and validation sets.
before moving forward you need to transpose your X and Y data, so that each row of X can correspond to the row of Y.
Here's a sample code for this:
% sample data
trainingdataX = cell(4, 1);
trainingdataY = cell(4, 1);
for i = 1:4
trainingdataX{i} = rand(541, 63);
trainingdataY{i} = rand(541, 1);
end
% Concatenate data
allX = vertcat(trainingdataX{:});
allY = vertcat(trainingdataY{:});
% Partition data (15% holdout for validation)
rng('default'); % For reproducibility
partition = cvpartition(size(allX, 1), 'Holdout', 0.15);
idxTrain = training(partition);
idxVal = test(partition);
% Split into training and validation sets
FinalTrainX = allX(idxTrain, :);
FinalTrainY = allY(idxTrain, :);
Xval = allX(idxVal, :);
Yval = allY(idxVal, :);
% Display results
fprintf('Training data X size: %dx%d\n', size(FinalTrainX, 1), size(FinalTrainX, 2));
fprintf('Training data Y size: %dx%d\n', size(FinalTrainY, 1), size(FinalTrainY, 2));
fprintf('Validation data X size: %dx%d\n', size(Xval, 1), size(Xval, 2));
fprintf('Validation data Y size: %dx%d\n', size(Yval, 1), size(Yval, 2));
I hope this helps!
2 个评论
Aditya
2024-8-1
编辑:Aditya
2024-8-1
As mentioned in my post that your initial data is in shape: 63X541 & 1X541, which is incorrect for vertical concat, for this you need to take the transpose of it and use it:
Inorder to transpose it you can use the below line of code:
% Transpose each cell using cellfun
trainingdataX = cellfun(@transpose, trainingdataX, 'UniformOutput', false);
trainingdataY = cellfun(@transpose, trainingdataY, 'UniformOutput', false);
or you can do it manually using the for loop!
Hope this clarifies your doubt!
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!