Improving parallelisation of sequentialfs

Question

Quant 2017-11-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/367715-improving-parallelisation-of-sequentialfs

编辑： Maciej Przydatek 2018-7-23

I am doing a student task about classification with the naive bayes classifier. I am experimenting with different distributions and forward feature selection on MNIST Fashion and UCI Spambase datasets. I have relatively powerful 4-core Intel i7-4770k coupled with 16 GB of RAM, so I set 'useParallel' to true. But still feature selection is quite slow. In fact, I got frustrated with feature selection using 'kernel' distribution on MNIST set not able to finish in 48 hours, and lookedthe ways tio speed it up.

I expected seqentialfs to try different combinations of features of one step in parallel, because they are independent from each other, but found that only cross validation is parallelised.

So I copy pasted the code from sequentialfs.m and made following modifications:

Commented out line 345
and replaced the loop on lines 351-354 with

 parfor k = 1:numAvailable
                crit(k) = callfun(fun,[X(:,in),X(:,available(k))],other_data,cv,mcreps,ParOptions);
            end

And ( renamed and modified) sequentialfs got faster by 1.5 -4.2 times depending on underlying distribution. I used UCI spamset for testing:

%%load data
clear all
close all
clc
spam_data=dlmread('spambase.data');%read in data
normal_factor=round(1/min(spam_data(spam_data>0)));
spam_data(:,1:end-1)=round(spam_data(:,1:end-1)*normal_factor);
distribs={'mn','mvmn','kernel'};
for k=1:3
if strcmp(distribs{k},'mvmn')
  fun = @(Xtrain,Ytrain,Xtest,Ytest)...
      sum(Ytest~=predict(fitcnb(Xtrain,Ytrain,'Distribution',distribs{k}, 'CategoricalPredictors', 'all'),Xtest));
else
fun = @(Xtrain,Ytrain,Xtest,Ytest)...
      sum(Ytest~=predict(fitcnb(Xtrain,Ytrain,'Distribution',distribs{k}),Xtest));
end
s = RandStream('mt19937ar','seed',2017);
RandStream.setGlobalStream(s);
 disp(['Starting feature selection with ', distribs{k}, ', bultin']);
 tic
fs = (sequentialfs(fun,spam_data(:,1:end-1),uint8(spam_data(:,end)),...
    'options',statset('Display','iter','UseParallel',true)));
toc
s = RandStream('mt19937ar','seed',2017);
RandStream.setGlobalStream(s);
 disp(['Starting feature selection with ', distribs{k}, ', custom with useParallel=true']);
tic
fs = (parsequentialfs(fun,spam_data(:,1:end-1),uint8(spam_data(:,end)),...
    'options',statset('Display','iter','UseParallel',true)));
toc
s = RandStream('mt19937ar','seed',2017);
RandStream.setGlobalStream(s);
 disp(['Starting feature selection with ', distribs{k}, ', custom with useParallel=false']);
tic
fs = (parsequentialfs(fun,spam_data(:,1:end-1),uint8(spam_data(:,end)),...
    'options',statset('Display','iter','UseParallel',false)));
toc
end

Clearly the CPU is significantly better utilised this way. Perhaps Mathworks can do the same?

I understand, that this would require much more rigorous testing with other classifiers, data sets and hardware configurations, but anyway Mathworks seem to disregard their own recommendation that parallel loops must be at the highest level possible

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Maciej Przydatek 2018-7-10

编辑：Maciej Przydatek 2018-7-23

This is excellent and works perfectly in my case! Sample calculations were taking about 55 second on single worker (100% core usage), almost 130 (!) seconds on 4 workers parpool with unmodified sequentialfs (about 45% usage of each core) and with modified sequentialfs I was able to go below 20 seconds (100% usage on all cores).

In R2018a, line numbers are 356 and 362-365 respectively.

EDIT: WARNING! With bigger data, this modification results in sudden memory consumption increase at the beginning of computations, which later goes down to average normal. The peak value depends on the size of data to be processed by workers. My data variable was 1208064000 bytes (over 1 GB) big and I had to use a swap partition of 16 GB (to double my 16 GB RAM) to avoid workers crash. My peak was at approximately 26 GB memory usage (all RAM consumed and most of the swap), but after half a minute RAM usage dropped to 8 GB. It may be caused by the process of distributing the data to the workers, but it's a blind shot.

请先登录，再进行评论。

请先登录，再回答此问题。

Improving parallelisation of sequentialfs

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

Improving parallelisation of sequentialfs

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论