Regularization for Naive Bayes

2 次查看(过去 30 天)
Xiwei She
Xiwei She 2017-1-10
编辑: Xiwei She 2017-1-10
I have a data which the number of features is much more than the number of examples, let's say input X is a 50 * 5000 matrix, 50 is the number of examples and 5000 is the number of features. And Y is the label with two classes 1 or 0. Now I want to use Naive Bayes classifier to make classification of this data. Because the features is much more than the examples, so the result is very poor because of the over-fitting. I already successfully tried lasso algorithm on this data and made pretty good classification result, now I want to compare it with Naive Bayes as a baseline. But the performance of NB is too bad to even make persuasive comparison. So I'm wondering whether I can add regularization to Naive Bayes like the lasso does and overcome this over-fitting problem. Below is my Naive Bayes Code, can anyone help me to revise this and let it had the regularization function? Thanks a lot!
X = rand(50, 5000); % This is my train/test sample matrix
Y = randi([0 1], 50, 1); % This is my train/test label vector
CrossValSet = cvpartition(Y,'KFold',3); % 3-fold Cross Validation
% Training set
Train_sample = X(training(CrossValSet,1),:);
Train_label = Y(training(CrossValSet,1));
% Test set
% Test_sample = X(training(CrossValSet,1),:);
Test_sample = X(test(CrossValSet,1),:);
% Test_label = Y(training(CrossValSet,1));
Test_label = Y(test(CrossValSet,1),:);
Class_num = length(unique(Train_label)); % Classes Pool - 1 and 0
Feature_num = size(Train_sample,2);
Para_mean = cell(1,Class_num);%Mean for each feature and class
Para_dev = cell(1,Class_num);%Dev for each feature and class
Sample_byclass = cell(1,Class_num);%Reorder the data set by class
Prior_prob = zeros(1,Class_num);%Prior probability of each class
%%Algorithm Processing
% Prior
for i=1:1:size(Train_sample, 1)
Sample_byclass{1,Train_label(i,1)+1} = [Sample_byclass{1,Train_label(i,1)+1}; Train_sample(i,:)];
Prior_prob(1,Train_label(i,1)+1) = Prior_prob(1,Train_label(i,1)+1) + 1;
end
Prior_prob = Prior_prob/size(Train_sample,1); % Prior probability
% Parameters from training set
for i=1:1:Class_num
mu = mean(Sample_byclass{1,i});
sigma = std(Sample_byclass{1,i});
Para_mean{1,i} = mu;
Para_dev{1,i} = sigma;
end
% Get predicted output for test set
predict = [];
for i = 1:size(Test_sample) %length(Test_sample)
prob = log(Prior_prob);
likelihood = 0;
for j = 1:Class_num
for k = 1:1:Feature_num % Adjust sigma if it's zero
if Para_dev{1,j}(1,k) == 0
Para_dev{1,j}(1,k) = 0.1667;
end
% Log - Gaussian
likelihood = likelihood - ( Test_sample(i,k) - Para_mean{1,j}(1,k))^2 / ( 2 * Para_dev{1,j}(1,k)^2 ) - log(Para_dev{1,j}(1,k));
end % For every Class
prob(1,j) = prob(1,j)+likelihood;
end
[value index] = max(prob);
predict = [predict ; index-1];
end
accuracy = length(find(predict - Test_label ==0))/length(Test_label);

回答(0 个)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by