Inefficiency in MEX function when passing data into output
1 次查看(过去 30 天)
显示 更早的评论
Hi,
I have a relatively slow function that I'm trying to speed up by converting to an MEX function.
I'll give a shorter, simpler, example function to illustrate my problem (note that it doesn't do anything meaningful - I'm only interested in computational efficiency for now).
function output = simplefunction %#codegen
rng(10)
output = zeros(10000,1);
for i = 1:10000
x = randi(10,1000,1);
y = randi(10,1,1000);
xy = bsxfun(@times, x, y);
output(i) = sum(xy(:));
end
end
simplefunction.m takes the outer product of two vectors, x and y, and sums over all the elements, storing these sums in 'output'.
What's puzzling me is that, when I convert this to an MEX file, the MEX version is actually slower:
codegen simplefunction
% matlab function
tic; simplefunction; toc;
Elapsed time is 5.018660 seconds.
% MEX function
tic; simplefunction_mex; toc;
Elapsed time is 15.600499 seconds.
But this behaviour only happens when I include the last line in the for-loop:
output(i) = sum(xy(:));
If I comment this line out, this behaviour disappears, and the MEX function is faster..
codegen simplefunction
% matlab function
tic; simplefunction; toc;
Elapsed time is 5.852922 seconds.
% mex function
tic; simplefunction_mex; toc;
Elapsed time is 0.181966 seconds.
AND, this behaviour is insensitive to the operation. If I replace the uncommented line with something else, e.g.:
output(i) = xy(1,1);
Then, the MEX function is still substantially slower.
Can anyone offer an explanation for why simply passing a value into the variable 'output' slows down my MEX function?
Thanks!
4 个评论
James Tursa
2020-5-22
"... However, I'm still not sure why my MEX function would be slower when I just pass a single value from xy to output ..."
For the same reason as before. As long as you don't use the xy output, the optimizing compiler might not even calculate it and hence the timings look great. But if you use even one element of xy, then the optimizer can't eliminate calculating xy and the timings go back up.
回答(1 个)
James Tursa
2020-5-22
编辑:James Tursa
2020-5-22
I am not sure you have enough insight into how codegen is going to code the randi, bsxfun, times, and sum functions to get what you want. At the m-file level, some of those functions are multi-threaded for sure (e.g., sum and times). And at the m-file level, xy(:) produces a shared data copy (takes very little time) but maybe codegen produces a deep copy for this step since it is not smart enough to see that with sum(xy(:)) you are just trying to sum all of the elements.
Without knowing these details about codegen, and given the fact that it probably changes from release to release, you might not have enough information.
I'm curious if this would change the timing (i.e., maybe avoid a deep copy if that is what is happening currently):
output(i) = sum(sum(sum(xy)));
You mentioned that this was just example code. What functions is your "real" code using?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Generating Code 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!