Clustering and feedforwardnet giving always the same result

Question

Jose Marques 2018-2-24

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/384592-clustering-and-feedforwardnet-giving-always-the-same-result

编辑： Jose Marques 2018-2-24

Hello! I am having some problems with clustering and feedforwardnet. The idea is compare these methods and improve the accuracy for feedforwardnet. The problem is that the always clustering accuracy is equal to feedforwardnet accuracy for test set. I think feedforwardnet trained with the new training data should give better results.

Algorithms steps:

1 - divide the data in two differents sets: training and test set.

2 - create a model for clustering using the training set.

3 - try to classify the test set using the clustering model.

4 - create a secondary training set, using the elements in the clustering classification which the assigned class is identical to the original target class and the maximum probability for belonging to the class is greater than z (0.7, for example).

5 - use the secondary training set to train a feedforwardnet

6 - classify the initial test set using the feedforwardnet created

% Reading the excel file created
german_data =  xlsread('german_data_numeric.xlsx');
% Setting the size of training dataset (k value)
training_size = 0.9;
% Shuffle rows of matrix: (some learning algorithms needs to shuffle the data!)
[r,~] = size(german_data);
randomRowIdxs = randperm(r);
german_data = german_data(randomRowIdxs,:);
% Splitting the dataset on two: the train and the test dataset
X_train = german_data(1:(training_size*length(german_data)),:);
X_test = german_data(training_size*length(german_data)+1:end,:);
% Taking the labels (it is the last column of german_data)
Y_train = X_train(:,end)';
Y_test = X_test(:,end)';
% Taking off the labels from the datasets
X_train = X_train(:,1:end-1)';
X_test = X_test(:,1:end-1)';
% Training model with the initial training lot (clustering)
GMModel = fitgmdist(X_train',2,'SharedCovariance',true,'CovarianceType','diagonal','Replicates',1000,'Start','randSample','RegularizationValue',0.01);
% class contains the predictions for X_train and P contains the
% probabilitys for each class
[class, nlogl, P, logpdf] = cluster (GMModel, X_train');
class = verify_class(class,Y_train);
% Creating the secondaries training lots for each z value
% (see the function 'generate_datasets')
z = 0.70;
[generated_Xtrain_70, generated_Ytrain_70] = generate_datasets(X_train,Y_train,z,P,class);
% Using a secondary training lot to train another classifier (using z = 0.7, for example)
generated_GMModel = fitgmdist(generated_Xtrain_70',2,'Regularize',0.01);
[generated_class, gen_nlogl, gen_P, gen_logpdf] = cluster (generated_GMModel,generated_Xtrain_70');
generated_class = verify_class(generated_class,generated_Ytrain_70);
% Testing the created models
[initial_class_test, ~, ~] = cluster (GMModel, X_test');
initial_class_test = verify_class(initial_class_test,Y_test);
[secondary_class_test, ~, ~] = cluster (generated_GMModel, X_test');
secondary_class_test = verify_class(secondary_class_test,Y_test);
% MLP training with regularized training data
X_MLP = horzcat(generated_Xtrain_70,X_test);
Y_MLP = horzcat(generated_Ytrain_70,Y_test);
% Creating a multilayer network the created net that has 3 layers, with 10
% neurons each ([10,10,10]), for example
net = feedforwardnet([10 10 10]); 
net = configure(net,X_MLP,Y_MLP);
net.performParam.regularization = 0.19;
net.divideFcn = 'divideblock';
net.divideParam.trainRatio = 0.7*(length(Y_MLP)-100);
net.divideParam.valRatio   = 0.3*(length(Y_MLP)-100);
net.divideParam.testRatio  = 100;
% The test batch will be the same for initial clustering
[net, training_record] = train(net,X_MLP,Y_MLP);
% Training Confusion Plot Variables
yTrn = net(X_MLP(:,training_record.trainInd));
tTrn = Y_MLP(:,training_record.trainInd);
% Validation Confusion Plot Variables
yVal = net(X_MLP(:,training_record.valInd));
tVal = Y_MLP(:,training_record.valInd); 
% Test Confusion Plot Variables
yTst1 = net(X_MLP(:,training_record.testInd));
tTst1 = Y_MLP(:,training_record.testInd);
% Test Confusion Plot Variables
yTst = net(X_test);
tTst = Y_test;
accuracy = length(find(round(yTst) == tTst))/length(tTst)*100
%%Ploting the confusion matrix
% Plot confusion (This function accepts only arrays with 0 or 1 values.
% Thus, we must subtract 1 for each array because the classes are 1 and 2.)
plotconfusion(Y_train-1, class'-1, 'Initial training lot',Y_test-1, initial_class_test'-1, 'Initial test lot',...
    generated_Ytrain_70-1,generated_class'-1, 'Generated training lot',Y_test-1,secondary_class_test'-1, 'Secondary test lot',...
    tTrn-1, yTrn-1, 'NN Training', tVal-1, yVal-1, 'NN Validation', tTst1-1, yTst1-1, 'NN Test');
% Verify_class function:
function [verified_class] = verify_class(class,target)
% If accuracy < 50; invert all the classes obtained
accuracy = length(find(class == target'))/length(target)*100;
if(accuracy<50)
    class(class==1) = 3;
    class(class==2) = 1;
    class(class==3) = 2;
end
verified_class = class;
end