PARFOR loop is too slow

Question

rambiz 2013-6-25

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/80191-parfor-loop-is-too-slow

Hi all, this is the code I wrote to simulate 8PSK in Rayleigh Channel. It is working and gives the correct results but it is too slow, compared to the case when I use an ordinary for-loop and run the code on only one core. Right now I run the "parfor-loop-code" on 4 cores. Is it because, I use too much memory for the all the {k}-indexed variables? How can I avoid this problem? TYVM

% 1 Transmit Antenna
% Nr Receive Antenna
% 3 bits per Transmission
clc;
clear all;
Nt = 1;                         % Tx antennas
Nr = 1;                         % Rx antennas
Ebit_N0_dB =0;
Es_N0_dec =1/Nr*3*10.^(Ebit_N0_dB/10);  %Multiply by 3 because Symbol   Energy   is trice as much as Bit Energy for 3-Bit-Transmission
                                      %divide by 1/Nr to normalize the received signal energy
                                      %in other words every receiver should receive 1/Nr of the total energy received at the receiver side
BER = zeros(1,length(Ebit_N0_dB));
symbol_errors = zeros(1,length(Ebit_N0_dB));
bit_errors = zeros(1,length(Ebit_N0_dB));
% Defining the Gray Coded Constellation
demap  =[0 0 0;0 0 1;0 1 0;0 1 1;1 0 0;1 0 1;1 1 0;1 1 1];
%  8PSK Symbols
S000=1; 
S001=exp(1j*1*pi/4);
S010=exp(1j*3*pi/4);
S011=exp(1j*2*pi/4);
S100=exp(1j*7*pi/4);
S101=exp(1j*6*pi/4);
S110=exp(1j*4*pi/4);
S111=exp(1j*5*pi/4);
N=10^6;
symbols=0;
xjq=cell(1,N);                    
argmin=cell(1,N);
tic
parfor K=1:N
               symbols=symbols+1
               bin_ip{K} = round(rand(1,3));      % Generate Binary Input
               %-------------------------------
                    if     bin_ip{K}==[0 0 0]  
                        xjq{K}=S000
                    elseif bin_ip{K}==[0 0 1]
                        xjq{K}=S001
                    elseif bin_ip{K}==[0 1 0]
                        xjq{K}=S010
                    elseif bin_ip{K}==[0 1 1]
                        xjq{K}=S011
                    elseif bin_ip{K}==[1 0 0]  
                        xjq{K}=S100
                    elseif bin_ip{K}==[1 0 1]
                        xjq{K}=S101
                    elseif bin_ip{K}==[1 1 0]
                        xjq{K}=S110
                    elseif bin_ip{K}==[1 1 1]
                        xjq{K}=S111
                    end
                    n{K} = (1/sqrt(2))*(randn(Nr,1) + 1i*randn(Nr,1)); % white Gaussian Noise with variance of 1
                    %==================================================================
                    %Generate Channel Matrix
                    %==================================================================
                   H{K}=(1/sqrt(2)*( randn(Nr,Nt)+1i*randn(Nr,Nt)));  
                   y{K} = sqrt(Es_N0_dec)*H{K}*xjq{K} + n{K};                  %=================================================================
                    %Optimal Detection Scheme
                    =================================================================
                   p = sqrt(Es_N0_dec); %8PSK
                   y000{K}=H{K}*p*S000; argmin{K}(1)=norm(y{K}-y000{K});
                   y001{K}=H{K}*p*S001; argmin{K}(2)=norm(y{K}-y001{K});
                   y010{K}=H{K}*p*S010; argmin{K}(3)=norm(y{K}-y010{K});
                   y011{K}=H{K}*p*S011; argmin{K}(4)=norm(y{K}-y011{K});
                   y100{K}=H{K}*p*S100; argmin{K}(5)=norm(y{K}-y100{K});
                   y101{K}=H{K}*p*S101; argmin{K}(6)=norm(y{K}-y101{K});
                   y110{K}=H{K}*p*S110; argmin{K}(7)=norm(y{K}-y110{K});
                   y111{K}=H{K}*p*S111; argmin{K}(8)=norm(y{K}-y111{K});
                    [~,est(K)] = min(argmin{K});
                    est_bin{K} = demap(est(K),:);
                    %Count bit errors
                   bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip{K}));
end
toc
                 BER = bit_errors/(symbols*3)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Matt J 2013-6-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/80191-parfor-loop-is-too-slow#answer_89893

编辑：Matt J 2013-6-25

在 MATLAB Online 中打开

You have a lot of on-the-fly memory allocation, which probably isn't helping things. H{k},y{k}, y000{k},etc... are not pre-allocated and get generated element-by-element in the loop. The contents of each argmin{K}, apparently a length 8 vector, are also not pre-allocated. Finally, it doesn't seem to make much sense holding these things in cell arrays. It looks like they all contain same-sized quantities and could be held instead in numeric arrays.

As another miscellaneous note, you can make things faster by skipping calls to find(),

bit_errors = bit_errors + sum( est_bin{K}~=bin_ip{K} );

3 个评论
显示 1更早的评论隐藏 1更早的评论

Matt J 2013-6-25

在 MATLAB Online 中打开

sum(.) is definitely slower!

No way.

    x=rand(1e8,1)>.5;
    tic;
      sum(x);
    toc
    %Elapsed time is 0.053452 seconds.
    tic;
     length(find(x));
    toc
    %Elapsed time is 0.904927 seconds.

rambiz 2013-7-21

编辑：rambiz 2013-7-21

在 MATLAB Online 中打开

Hi guys, tyvm for your comments so far. OK, I changed the original code, taking Matt's suggestions into consideration, i.e. I preallocated memory and used normal arrays instead of cells. The code runs in about 105 seconds (for 10^5 runs) and it is still very slow. Using sum(.) or length(find(.)) doesn't change the speed considerably. Any new comments, please?!

% Nt Transmit Antenna
% Nr Receive Antenna
% 3 bits per Transmission
% clc;
clear all;
tic
Nt = 1;                         % Tx antennas
Nr = 1;                         % Rx antennas
Ebit_N0_dB =0;
Es_N0_dec =1/Nr*3*10.^(Ebit_N0_dB/10);  %Multiply by 3 because Symbol    Energy is trice as much as Bit Energy for 3-Bit-Transmission
                                      %divide by 1/Nr to normalize the received signal energy
                                      %in other words every receiver should receive 1/Nr of the total energy received at the receiver side
BER = zeros(1,length(Ebit_N0_dB));
symbol_errors = zeros(1,length(Ebit_N0_dB));
bit_errors = zeros(1,length(Ebit_N0_dB));
% symbols = zeros(1,length(Ebit_N0_dB));
% Defining the Gray Coded Constellation
demap  =[0 0 0;0 0 1;0 1 0;0 1 1;1 0 0;1 0 1;1 1 0;1 1 1];
% %  8PSK Symbols
S000=1;
S001=exp(1j*1*pi/4);
S010=exp(1j*3*pi/4);
S011=exp(1j*2*pi/4);
S100=exp(1j*7*pi/4);
S101=exp(1j*6*pi/4);
S110=exp(1j*4*pi/4);
S111=exp(1j*5*pi/4);
N=5*10^5;
symbols=0;
bin_ip=round(rand(N,3));
xjq=zeros(1,N); 
n= (1/sqrt(2))*(randn(Nr,N) + 1i*randn(Nr,N));
H=(1/sqrt(2)*( randn(Nr,Nt,N)+1i*randn(Nr,Nt,N)));
y=zeros(Nr,N);
y000=zeros(Nr,N);
y001=zeros(Nr,N);
y010=zeros(Nr,N);
y011=zeros(Nr,N);
y100=zeros(Nr,N);
y101=zeros(Nr,N);
y110=zeros(Nr,N);
y111=zeros(Nr,N);
argmin=zeros(8,N);
parfor K=1:N
symbols=symbols+1;
%-------------------------------
                  if     bin_ip(K,:)==[0 0 0]  
                      xjq(K)=S000;
                  elseif bin_ip(K,:)==[0 0 1]
                      xjq(K)=S001;
                  elseif bin_ip(K,:)==[0 1 0]
                      xjq(K)=S010;
                  elseif bin_ip(K,:)==[0 1 1]
                      xjq(K)=S011;
                  elseif bin_ip(K,:)==[1 0 0]  
                      xjq(K)=S100;
                  elseif bin_ip(K,:)==[1 0 1]
                      xjq(K)=S101;
                  elseif bin_ip(K,:)==[1 1 0]
                      xjq(K)=S110;
                  elseif bin_ip(K,:)==[1 1 1]
                      xjq(K)=S111;
                  end
y(:,K) = sqrt(Es_N0_dec)*H(:,:,K)*xjq(K) + n(:,K);        %8PSK
                   %Optimal Detection Scheme
%  =============================================================
p = sqrt(Es_N0_dec); %8PSK
y000(:,K)=H(:,:,K)*p*S000; 
y001(:,K)=H(:,:,K)*p*S001; 
y010(:,K)=H(:,:,K)*p*S010; 
y011(:,K)=H(:,:,K)*p*S011; 
y100(:,K)=H(:,:,K)*p*S100; 
y101(:,K)=H(:,:,K)*p*S101; 
y110(:,K)=H(:,:,K)*p*S110; 
y111(:,K)=H(:,:,K)*p*S111; 
argmin(:,K)=[norm(y(:,K)-y000(:,K));
             norm(y(:,K)-y001(:,K));
             norm(y(:,K)-y010(:,K));
             norm(y(:,K)-y011(:,K));
             norm(y(:,K)-y100(:,K));
             norm(y(:,K)-y101(:,K));
             norm(y(:,K)-y110(:,K));
             norm(y(:,K)-y111(:,K))];
                    [~,est(K)] = min(argmin(:,K));
                    est_bin{K} = demap(est(K),:);
                    %Count bit errors
  bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip(K,:)));
  %bit_errors = bit_errors + sum( est_bin{K}~=bin_ip(K,:));
end
toc
                 BER = bit_errors/(symbols*3)

请先登录，再进行评论。

Answer 2

Muthu Annamalai 2013-6-25

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/80191-parfor-loop-is-too-slow#answer_89904

编辑：Muthu Annamalai 2013-6-25

在 MATLAB Online 中打开

Key to using parfor must be the independence of each iteration; i.e. you cannot write variable updates like,

a = a + 1

So change the lines,

   bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip{K}));
   bit_errors(k) = length(find( est_bin{K}~=bin_ip{K}));

and rewrite

BER = bit_errors/(symbols*3)

as

BER = sum(bit_errors)/(symbols*3)

and remove the line,

symbols=symbols+1

because this variable is going to be N at end of all iterations.

This should give you some speedup.

2 个评论
显示无隐藏无

Matt J 2013-6-25

you cannot write variable updates like a=a+1

Why not? Don't they qualify as legitimate reduction assignments as described here?

http://www.mathworks.com/help/distcomp/getting-started-with-parfor.html#brdqn6p-1

Muthu Annamalai 2013-6-25

@Matt I probably stand corrected in that case, while still my approach is explicitly parallelizable.

请先登录，再进行评论。

PARFOR loop is too slow

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

3 个评论
显示 1更早的评论隐藏 1更早的评论

2 个评论
显示无隐藏无

另请参阅

类别

标签

产品

Community Treasure Hunt

PARFOR loop is too slow

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（2 个）

3 个评论 显示 1更早的评论隐藏 1更早的评论

2 个评论 显示 无隐藏 无

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

2 个评论
显示无隐藏无