Data Partition using CVPartition_ Warning
4 次查看(过去 30 天)
显示 更早的评论
Dear All;
i am trying to use divide my data using Cvpartition with "Kfold" option in order to use for cross valdtion in neural network, i have a function to do that as shown below , it works but it give a warning message and i do not know why it is coming
Warning: One or more folds do not contain points from all the groups.
> In internal.stats.cvpartitionImpl>stra_kfoldcv (line 364)
In internal.stats.cvpartitionImpl/rerandom (line 315)
In internal.stats.cvpartitionInMemoryImpl (line 166)
In cvpartition (line 175)
In jFFNN_REG (line 14)
In NN_Kfold_Regression (line 8)
Function:
function [FFNN,Pred,Actual]=jFFNN_REG(input,output,kfold,Hiddens,Maxepochs)
% Layer
if length(Hiddens)==1
h1=Hiddens(1); net=fitnet(h1);
elseif length(Hiddens)==2
h1=Hiddens(1); h2=Hiddens(2); net=fitnet([h1 h2]);
elseif length(Hiddens)==3
h1=Hiddens(1); h2=Hiddens(2);
h3=Hiddens(3); net=fitnet([h1 h2 h3]);
end
%rng('default');
% Divide data into k-folds
fold=cvpartition(output,'kfold',kfold);
% Pre
pred2=[]; ytest2=[]; Afold=zeros(kfold,1);
% Neural network start
for i=1:kfold
% Call index of training & testing sets
trainIdx=fold.training(i); testIdx=fold.test(i);
% Call training & testing inputures and labels
xtrain=input(trainIdx,:); ytrain=output(trainIdx);
xtest=input(testIdx,:); ytest=output(testIdx);
% Set Maximum epochs
net.trainParam.epochs= Maxepochs;
% Training model
net=train(net,xtrain',ytrain');
% Perform testing
pred=net(xtest');
% Perfomance
tstPerform = perform(net, ytest', pred);
% Get accuracy for each fold
Afold(i)=tstPerform;
% Store temporary result for each fold
pred2=[pred2(1:end,:),pred]; ytest2=[ytest2(1:end);ytest];
end
0 个评论
采纳的回答
Divya Gaddipati
2019-8-5
c = cvpartition(n,'KFold',k)
The above syntax of the function randomly splits the “n” observations into “k” disjoint sets of roughly equal size. Hence, it doesn’t ensure if all the “k” sets include samples corresponding to all the classes. If your dataset is highly imbalanced, there is a possibility that some of the sets might not contain samples corresponding to the minority class.
c = cvpartition(group,'KFold',k,'Stratify',true)
While, the above syntax of the function ensures that each of the “k” sets contain approximately the same percentage of samples for each class as the complete set.
In case of large imbalance in the distribution of target classes, it is recommended to use stratified sampling to ensure that relative class frequencies are approximately preserved in each train and validation fold.
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Gaussian Process Regression 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!