How to partition data in cells for validation in machine learning model?

3 次查看(过去 30 天)
Hello there , I have training data for 4 trials stores in a 4x1 cell named "trainingdataX" and "trainingdataY" as whoen here and I am trying to pull out 15 percent of all this data for validation purposes and store it in variables "Xval" and "Yval". How would I be able to do this if the data is stored in a cells corresponding to the trials and ensure the corresponding value is partioned out for validation too? Any help is greatly appreciated!
%Exclude Data for Val
rng('default')
n = %im not sure what to put here to have it pull data from each of the 4 trials
partition = cvpartition(n,'Holdout',0.15);
idxTrain = training(partition);
FinalTrainX = trainingdataX(idxTrain,:)
FinalTrainY = trainingdataY(idxTrain,:)
idxNew = test(partition);
Xval = trainingdataX(idxNew,:)
Yval = trainingdataY(idxNew,:)

回答(2 个)

YERRAMADAS
YERRAMADAS 2024-8-1
Use the cross-validation method to maximize the data available for each of these sets

Aditya
Aditya 2024-8-1
To partition data stored in cells for validation, you need to first concatenate the data from all trials into single matrices. After partitioning, you can then split the data back into the training and validation sets.
before moving forward you need to transpose your X and Y data, so that each row of X can correspond to the row of Y.
Here's a sample code for this:
% sample data
trainingdataX = cell(4, 1);
trainingdataY = cell(4, 1);
for i = 1:4
trainingdataX{i} = rand(541, 63);
trainingdataY{i} = rand(541, 1);
end
% Concatenate data
allX = vertcat(trainingdataX{:});
allY = vertcat(trainingdataY{:});
% Partition data (15% holdout for validation)
rng('default'); % For reproducibility
partition = cvpartition(size(allX, 1), 'Holdout', 0.15);
idxTrain = training(partition);
idxVal = test(partition);
% Split into training and validation sets
FinalTrainX = allX(idxTrain, :);
FinalTrainY = allY(idxTrain, :);
Xval = allX(idxVal, :);
Yval = allY(idxVal, :);
% Display results
fprintf('Training data X size: %dx%d\n', size(FinalTrainX, 1), size(FinalTrainX, 2));
fprintf('Training data Y size: %dx%d\n', size(FinalTrainY, 1), size(FinalTrainY, 2));
fprintf('Validation data X size: %dx%d\n', size(Xval, 1), size(Xval, 2));
fprintf('Validation data Y size: %dx%d\n', size(Yval, 1), size(Yval, 2));
I hope this helps!
  2 个评论
Isabelle Museck
Isabelle Museck 2024-8-1
Hello Aditya,
Thank you so much for your help. This makes alot of sense however when I try intergrate the code qwith my data is not vertically concatoning the data in the cells properly. I am ending up with "allX" being a 252x541 double and "allY" being a 4x541 double as shown here:
When I run the code you provided, I should be getting a 2163x63 double for "allX" and a 2164x1 double for "allY". Do you know why it may not be concatonating correclty for me and my data?
Aditya
Aditya 2024-8-1
编辑:Aditya 2024-8-1
As mentioned in my post that your initial data is in shape: 63X541 & 1X541, which is incorrect for vertical concat, for this you need to take the transpose of it and use it:
Inorder to transpose it you can use the below line of code:
% Transpose each cell using cellfun
trainingdataX = cellfun(@transpose, trainingdataX, 'UniformOutput', false);
trainingdataY = cellfun(@transpose, trainingdataY, 'UniformOutput', false);
or you can do it manually using the for loop!
Hope this clarifies your doubt!

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Statistics and Machine Learning Toolbox 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by