PARFOR loop is too slow
1 次查看(过去 30 天)
显示 更早的评论
Hi all, this is the code I wrote to simulate 8PSK in Rayleigh Channel. It is working and gives the correct results but it is too slow, compared to the case when I use an ordinary for-loop and run the code on only one core. Right now I run the "parfor-loop-code" on 4 cores. Is it because, I use too much memory for the all the {k}-indexed variables? How can I avoid this problem? TYVM
% 1 Transmit Antenna
% Nr Receive Antenna
% 3 bits per Transmission
clc;
clear all;
Nt = 1; % Tx antennas
Nr = 1; % Rx antennas
Ebit_N0_dB =0;
Es_N0_dec =1/Nr*3*10.^(Ebit_N0_dB/10); %Multiply by 3 because Symbol Energy is trice as much as Bit Energy for 3-Bit-Transmission
%divide by 1/Nr to normalize the received signal energy
%in other words every receiver should receive 1/Nr of the total energy received at the receiver side
BER = zeros(1,length(Ebit_N0_dB));
symbol_errors = zeros(1,length(Ebit_N0_dB));
bit_errors = zeros(1,length(Ebit_N0_dB));
% Defining the Gray Coded Constellation
demap =[0 0 0;0 0 1;0 1 0;0 1 1;1 0 0;1 0 1;1 1 0;1 1 1];
% 8PSK Symbols
S000=1;
S001=exp(1j*1*pi/4);
S010=exp(1j*3*pi/4);
S011=exp(1j*2*pi/4);
S100=exp(1j*7*pi/4);
S101=exp(1j*6*pi/4);
S110=exp(1j*4*pi/4);
S111=exp(1j*5*pi/4);
N=10^6;
symbols=0;
xjq=cell(1,N);
argmin=cell(1,N);
tic
parfor K=1:N
symbols=symbols+1
bin_ip{K} = round(rand(1,3)); % Generate Binary Input
%-------------------------------
if bin_ip{K}==[0 0 0]
xjq{K}=S000
elseif bin_ip{K}==[0 0 1]
xjq{K}=S001
elseif bin_ip{K}==[0 1 0]
xjq{K}=S010
elseif bin_ip{K}==[0 1 1]
xjq{K}=S011
elseif bin_ip{K}==[1 0 0]
xjq{K}=S100
elseif bin_ip{K}==[1 0 1]
xjq{K}=S101
elseif bin_ip{K}==[1 1 0]
xjq{K}=S110
elseif bin_ip{K}==[1 1 1]
xjq{K}=S111
end
n{K} = (1/sqrt(2))*(randn(Nr,1) + 1i*randn(Nr,1)); % white Gaussian Noise with variance of 1
%==================================================================
%Generate Channel Matrix
%==================================================================
H{K}=(1/sqrt(2)*( randn(Nr,Nt)+1i*randn(Nr,Nt)));
y{K} = sqrt(Es_N0_dec)*H{K}*xjq{K} + n{K}; %=================================================================
%Optimal Detection Scheme
=================================================================
p = sqrt(Es_N0_dec); %8PSK
y000{K}=H{K}*p*S000; argmin{K}(1)=norm(y{K}-y000{K});
y001{K}=H{K}*p*S001; argmin{K}(2)=norm(y{K}-y001{K});
y010{K}=H{K}*p*S010; argmin{K}(3)=norm(y{K}-y010{K});
y011{K}=H{K}*p*S011; argmin{K}(4)=norm(y{K}-y011{K});
y100{K}=H{K}*p*S100; argmin{K}(5)=norm(y{K}-y100{K});
y101{K}=H{K}*p*S101; argmin{K}(6)=norm(y{K}-y101{K});
y110{K}=H{K}*p*S110; argmin{K}(7)=norm(y{K}-y110{K});
y111{K}=H{K}*p*S111; argmin{K}(8)=norm(y{K}-y111{K});
[~,est(K)] = min(argmin{K});
est_bin{K} = demap(est(K),:);
%Count bit errors
bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip{K}));
end
toc
BER = bit_errors/(symbols*3)
0 个评论
回答(2 个)
Matt J
2013-6-25
编辑:Matt J
2013-6-25
You have a lot of on-the-fly memory allocation, which probably isn't helping things. H{k},y{k}, y000{k},etc... are not pre-allocated and get generated element-by-element in the loop. The contents of each argmin{K}, apparently a length 8 vector, are also not pre-allocated. Finally, it doesn't seem to make much sense holding these things in cell arrays. It looks like they all contain same-sized quantities and could be held instead in numeric arrays.
As another miscellaneous note, you can make things faster by skipping calls to find(),
bit_errors = bit_errors + sum( est_bin{K}~=bin_ip{K} );
3 个评论
Matt J
2013-6-25
sum(.) is definitely slower!
No way.
x=rand(1e8,1)>.5;
tic;
sum(x);
toc
%Elapsed time is 0.053452 seconds.
tic;
length(find(x));
toc
%Elapsed time is 0.904927 seconds.
Muthu Annamalai
2013-6-25
编辑:Muthu Annamalai
2013-6-25
Key to using parfor must be the independence of each iteration; i.e. you cannot write variable updates like,
a = a + 1
So change the lines,
bit_errors = bit_errors + length(find( est_bin{K}~=bin_ip{K}));
bit_errors(k) = length(find( est_bin{K}~=bin_ip{K}));
and rewrite
BER = bit_errors/(symbols*3)
as
BER = sum(bit_errors)/(symbols*3)
and remove the line,
symbols=symbols+1
because this variable is going to be N at end of all iterations.
This should give you some speedup.
2 个评论
Matt J
2013-6-25
you cannot write variable updates like a=a+1
Why not? Don't they qualify as legitimate reduction assignments as described here?
Muthu Annamalai
2013-6-25
@Matt I probably stand corrected in that case, while still my approach is explicitly parallelizable.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 PSK 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!