Inefficiency in MEX function when passing data into output

1 次查看(过去 30 天)
Hi,
I have a relatively slow function that I'm trying to speed up by converting to an MEX function.
I'll give a shorter, simpler, example function to illustrate my problem (note that it doesn't do anything meaningful - I'm only interested in computational efficiency for now).
function output = simplefunction %#codegen
rng(10)
output = zeros(10000,1);
for i = 1:10000
x = randi(10,1000,1);
y = randi(10,1,1000);
xy = bsxfun(@times, x, y);
output(i) = sum(xy(:));
end
end
simplefunction.m takes the outer product of two vectors, x and y, and sums over all the elements, storing these sums in 'output'.
What's puzzling me is that, when I convert this to an MEX file, the MEX version is actually slower:
codegen simplefunction
% matlab function
tic; simplefunction; toc;
Elapsed time is 5.018660 seconds.
% MEX function
tic; simplefunction_mex; toc;
Elapsed time is 15.600499 seconds.
But this behaviour only happens when I include the last line in the for-loop:
output(i) = sum(xy(:));
If I comment this line out, this behaviour disappears, and the MEX function is faster..
codegen simplefunction
% matlab function
tic; simplefunction; toc;
Elapsed time is 5.852922 seconds.
% mex function
tic; simplefunction_mex; toc;
Elapsed time is 0.181966 seconds.
AND, this behaviour is insensitive to the operation. If I replace the uncommented line with something else, e.g.:
output(i) = xy(1,1);
Then, the MEX function is still substantially slower.
Can anyone offer an explanation for why simply passing a value into the variable 'output' slows down my MEX function?
Thanks!
  4 个评论
Matthew Anderson
Matthew Anderson 2020-5-20
Thanks for responding Walter.
So If I understand you correctly, the MEX function was only faster than the equivalent MATLAB function because the compiler, knowing that the loop operations didn't affect the output, ignored these lines.
I can see why using the sum() operation might be faster in matlab if it uses multithreading by default (and not in the MEX function). However, I'm still not sure why my MEX function would be slower when I just pass a single value from xy to output. e.g:
% output(i) = sum(xy(:));
output(i) = xy(1,1);
How could I go about making the MEX function faster then? I tried passing in an zero-array as an input (that is modified to produce the output), but my problem seems to persist.
Thanks for your help.
James Tursa
James Tursa 2020-5-22
"... However, I'm still not sure why my MEX function would be slower when I just pass a single value from xy to output ..."
For the same reason as before. As long as you don't use the xy output, the optimizing compiler might not even calculate it and hence the timings look great. But if you use even one element of xy, then the optimizer can't eliminate calculating xy and the timings go back up.

请先登录,再进行评论。

回答(1 个)

James Tursa
James Tursa 2020-5-22
编辑:James Tursa 2020-5-22
I am not sure you have enough insight into how codegen is going to code the randi, bsxfun, times, and sum functions to get what you want. At the m-file level, some of those functions are multi-threaded for sure (e.g., sum and times). And at the m-file level, xy(:) produces a shared data copy (takes very little time) but maybe codegen produces a deep copy for this step since it is not smart enough to see that with sum(xy(:)) you are just trying to sum all of the elements.
Without knowing these details about codegen, and given the fact that it probably changes from release to release, you might not have enough information.
I'm curious if this would change the timing (i.e., maybe avoid a deep copy if that is what is happening currently):
output(i) = sum(sum(sum(xy)));
You mentioned that this was just example code. What functions is your "real" code using?
  1 个评论
Matthew Anderson
Matthew Anderson 2020-5-25
Hi James,
You are correct - I have no experience with c and building MEX functions in MATLAB. I guess if I really wanted to optimize this function then I'd need a deeper understanding of these things. I have been using codegen as a quick method of increasing efficiency for some of my other functions, but it doesn't work in this case.
Doing the double-sum didn't change anything I'm afraid!
In my actual function, I iteratively compute the element-wise product between K rows of an NxN adjacency matrix (type=logical), and an KxN matrix (type=double). Then I take the row-sum over the resulting KxN matrix. So the main functions are .*, +, bsxfun(), nansum(), and randperm(). This is a serial optimization problem so I'm invoking these functions hundreds of thousands of times.
Something peculiar I have also found is that the MEX function is faster when xy is computed as follows:
function output = simplefunction2 %#codegen
rng(10)
output = zeros(1e+5,1);
for i = 1:1e+5
x = randi(10,1000,1);
y = randi(10,1000,1);
xy = bsxfun(@eq,x, y);
output(i) = sum(xy);
end
end
Timings for .m and MEX respectively:
Elapsed time is 25.291566 seconds.
Elapsed time is 21.792464 seconds.
In this second function, there's no array expansion going on. xy is a 1x1000 vector. Maybe this sheds light on the matter? Does this suggest that multithreading is the explanation here?
Thanks for your help by the way.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Generating Code 的更多信息

产品


版本

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by