Inefficiency in MEX function when passing data into output

Question

0 个投票

Hi,

I have a relatively slow function that I'm trying to speed up by converting to an MEX function.

I'll give a shorter, simpler, example function to illustrate my problem (note that it doesn't do anything meaningful - I'm only interested in computational efficiency for now).

function output = simplefunction %#codegen
    rng(10)
    output = zeros(10000,1);
    for i = 1:10000
        x = randi(10,1000,1);
        y = randi(10,1,1000);
        xy = bsxfun(@times, x, y);
        output(i) = sum(xy(:));
    end
end

simplefunction.m takes the outer product of two vectors, x and y, and sums over all the elements, storing these sums in 'output'.

What's puzzling me is that, when I convert this to an MEX file, the MEX version is actually slower:

codegen simplefunction
% matlab function
tic; simplefunction; toc;
Elapsed time is 5.018660 seconds.
% MEX function
tic; simplefunction_mex; toc;
Elapsed time is 15.600499 seconds.

But this behaviour only happens when I include the last line in the for-loop:

output(i) = sum(xy(:));

If I comment this line out, this behaviour disappears, and the MEX function is faster..

codegen simplefunction
% matlab function
tic; simplefunction; toc;
Elapsed time is 5.852922 seconds.
% mex function
tic; simplefunction_mex; toc;
Elapsed time is 0.181966 seconds.

AND, this behaviour is insensitive to the operation. If I replace the uncommented line with something else, e.g.:

output(i) = xy(1,1);

Then, the MEX function is still substantially slower.

Can anyone offer an explanation for why simply passing a value into the variable 'output' slows down my MEX function?

Thanks!

4 个评论
显示 2更早的评论隐藏 2更早的评论

Matthew Anderson 2020-5-20

在 MATLAB Online 中打开

Thanks for responding Walter.

So If I understand you correctly, the MEX function was only faster than the equivalent MATLAB function because the compiler, knowing that the loop operations didn't affect the output, ignored these lines.

I can see why using the sum() operation might be faster in matlab if it uses multithreading by default (and not in the MEX function). However, I'm still not sure why my MEX function would be slower when I just pass a single value from xy to output. e.g:

% output(i) = sum(xy(:));
output(i) = xy(1,1);

How could I go about making the MEX function faster then? I tried passing in an zero-array as an input (that is modified to produce the output), but my problem seems to persist.

Thanks for your help.

James Tursa 2020-5-22

"... However, I'm still not sure why my MEX function would be slower when I just pass a single value from xy to output ..."

For the same reason as before. As long as you don't use the xy output, the optimizing compiler might not even calculate it and hence the timings look great. But if you use even one element of xy, then the optimizer can't eliminate calculating xy and the timings go back up.

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

James Tursa 2020-5-22

编辑：James Tursa 2020-5-22

在 MATLAB Online 中打开

1 个投票

I am not sure you have enough insight into how codegen is going to code the randi, bsxfun, times, and sum functions to get what you want. At the m-file level, some of those functions are multi-threaded for sure (e.g., sum and times). And at the m-file level, xy(:) produces a shared data copy (takes very little time) but maybe codegen produces a deep copy for this step since it is not smart enough to see that with sum(xy(:)) you are just trying to sum all of the elements.

Without knowing these details about codegen, and given the fact that it probably changes from release to release, you might not have enough information.

I'm curious if this would change the timing (i.e., maybe avoid a deep copy if that is what is happening currently):

output(i) = sum(sum(sum(xy)));

You mentioned that this was just example code. What functions is your "real" code using?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Matthew Anderson 2020-5-25

在 MATLAB Online 中打开

Hi James,

You are correct - I have no experience with c and building MEX functions in MATLAB. I guess if I really wanted to optimize this function then I'd need a deeper understanding of these things. I have been using codegen as a quick method of increasing efficiency for some of my other functions, but it doesn't work in this case.

Doing the double-sum didn't change anything I'm afraid!

In my actual function, I iteratively compute the element-wise product between K rows of an NxN adjacency matrix (type=logical), and an KxN matrix (type=double). Then I take the row-sum over the resulting KxN matrix. So the main functions are .*, +, bsxfun(), nansum(), and randperm(). This is a serial optimization problem so I'm invoking these functions hundreds of thousands of times.

Something peculiar I have also found is that the MEX function is faster when xy is computed as follows:

function output = simplefunction2 %#codegen
    rng(10)
    output = zeros(1e+5,1);
    for i = 1:1e+5
        x = randi(10,1000,1);
        y = randi(10,1000,1);
        xy = bsxfun(@eq,x, y);
        output(i) = sum(xy);
    end
end

Timings for .m and MEX respectively:

Elapsed time is 25.291566 seconds.
Elapsed time is 21.792464 seconds.

In this second function, there's no array expansion going on. xy is a 1x1000 vector. Maybe this sheds light on the matter? Does this suggest that multithreading is the explanation here?

Thanks for your help by the way.

请先登录，再进行评论。

Inefficiency in MEX function when passing data into output

4 个评论
显示 2更早的评论隐藏 2更早的评论

回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

类别

产品

版本

标签

Community Treasure Hunt

Inefficiency in MEX function when passing data into output

4 个评论 显示 2更早的评论 隐藏 2更早的评论

回答（1 个）

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

4 个评论
显示 2更早的评论隐藏 2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论