how to specify the input and target data

11 次查看(过去 30 天)
I have a dataset 2310x25 table. I dont know how to specify the input and target data. i'm using the below code for k fold cross validation.
data= dlmread('data\\inputs1.txt'); %inputs
groups=dlmread('data\\targets1.txt'); % target
Fold=10;
indices = crossvalind('Kfold',length(groups),Fold);
for i =1:Fold
testy = (indices == i);
trainy = (~testy);
TestInputData=data(testy,:)';
TrainInputData=data(trainy,:)';
TestOutputData=groups(testy,:)';
TrainOutputData=groups(trainy,:)';
  8 个评论
Walter Roberson
Walter Roberson 2022-6-20
Are you aware that some of the entries are question mark?
uma
uma 2022-6-21
yes I know that. Now can you tell me how this dataset can be used to specify the input and target data

请先登录,再进行评论。

回答(1 个)

Walter Roberson
Walter Roberson 2022-6-21
filename = 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1038775/bankruptcy.csv';
opt = detectImportOptions(filename, 'TrimNonNumeric', true);
data = readmatrix(filename, opt);
data = rmmissing(data);
groups = data(:,end);
data = data(:,1:end-1);
whos groups
Name Size Bytes Class Attributes groups 3194x1 25552 double
[sum(groups==0), sum(groups==1)]
ans = 1×2
3164 30
cp = classperf(groups);
Fold=10;
indices = crossvalind('Kfold',length(groups),Fold);
failures = 0;
for i =1:Fold
test = (indices == i);
train = ~test;
try
class = classify(data(test,:), data(train,:), groups(train,:));
classperf(cp, lass, test);
catch ME
failures = failures + 1;
if failures <= 5
fprintf('failed on iteration %d\n', i);
else
break
end
end
end
failed on iteration 1 failed on iteration 2 failed on iteration 3 failed on iteration 4 failed on iteration 5
cp
Label: '' Description: '' ClassLabels: [2×1 double] GroundTruth: [3194×1 double] NumberOfObservations: 3194 ControlClasses: 2 TargetClasses: 1 ValidationCounter: 0 SampleDistribution: [3194×1 double] ErrorDistribution: [3194×1 double] SampleDistributionByClass: [2×1 double] ErrorDistributionByClass: [2×1 double] CountingMatrix: [3×2 double] CorrectRate: NaN ErrorRate: NaN LastCorrectRate: 0 LastErrorRate: 0 InconclusiveRate: NaN ClassifiedRate: NaN Sensitivity: NaN Specificity: NaN PositivePredictiveValue: NaN NegativePredictiveValue: NaN PositiveLikelihood: NaN NegativeLikelihood: NaN Prevalence: NaN DiagnosticTable: [2×2 double]
  1 个评论
Walter Roberson
Walter Roberson 2022-6-21
The reason for the failure is that you only have 30 entries with class 1, and when you are doing random selection for K-fold purposes, you are ending up with situations where there are no entries for class 1 in the training data.

请先登录,再进行评论。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by