why is this Matlab Code faster than the C++ code below? I want to understand what Matlab internally does better and faster than C++

Question

Thomas 2022-6-17

2
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1742610-why-is-this-matlab-code-faster-than-the-c-code-below-i-want-to-understand-what-matlab-internally

评论： Thomas 2022-6-19

why is this Matlab Code

function primes = sieve_era2(N)
% sieve of Erathostenes without upper bound of search space (could theoretically run forever)
if nargin == 0
    N = Inf;
end
primes.number(1) = 2;
primes.counter(1) = primes.number(1);
k = 2;
k1 = 2;
tic;
while k <= N
    k = k + 1;                                          % check next k if it is prime 
    if mod(k,100000) == 0
        fprintf("numbers checked: %i, number of primes found: %i, largest prime found: %i, time: %.2f seconds \n", k, k1, primes.number(end), toc);
    end
    primes.counter = primes.counter - 1;                % all counters reduced by 1
    if min(primes.counter) == 0
        primes.counter((primes.counter == 0)) =  primes.number((primes.counter == 0));
        continue;                                       % current numer is not a prime
    end
    k1 = k1 + 1;                                    % no counter was reduced to zero --> current number is a new prime
    primes.number(k1-1) = k;
    primes.counter(k1-1) = primes.number(k1-1);
    
end
end

faster than this C++ code:

// sieve_era2.cpp : sieve of Erathostenes without upper bound of search space (could theoretically run forever)
#include <iostream>
#include <vector>
#include <algorithm>
#include <time.h>
#include <chrono>
using namespace std; 
using namespace std::chrono;
struct primes
{
    std::vector<int> number {2};
    std::vector<int> counter {2};
} p;
primes sieve_era2(int N)
{
    int k = 2;
    bool prime{ true };
    while (k <= N)
    {
        k = k + 1;
        for (int j = 0; j <= p.counter.size()-1; j++)
        {
            p.counter[j] = p.counter[j] - 1;
            if (p.counter[j] == 0)
            {
                p.counter[j] = p.number[j];
                prime = false;
            }
        }
        if (prime == false)
        {
            prime = true;
            continue;
        }
        p.number.push_back(k);
        p.counter.push_back(p.number.back());
    }
    return p;
}
int main()
{
    primes p;
    int N = 200000;
    unsigned __int64 tic = duration_cast<milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
    p = sieve_era2(N);
    unsigned __int64 toc = duration_cast<milliseconds>(std::chrono::system_clock::now().time_since_epoch()).count();
    cout << (toc - tic) / 1000 << " Seconds " << std::endl;
    system("pause");

Matlab runs 12 sec, C++ about 55 sec.

I want to understand what Matlab internally might be doing better and faster than C++

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Chris 2022-6-17

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1742610-why-is-this-matlab-code-faster-than-the-c-code-below-i-want-to-understand-what-matlab-internally#answer_987920

编辑：Chris 2022-6-17

在 MATLAB Online 中打开

I see an efficiency in primes.counter = primes.counter - 1;

Matlab uses LAPACK for matrix/vector operations, which I think should be faster than a for loop in C.

Same for the if block that follows--especially for the if block, since you're using if once per outer loop in Matlab, and many times per outer loop in C.

You could try timing those operations separately, a few thousand at a time. In Matlab, for instance:

counter = rand(10000,1);
timeit(@() counterTest(counter))
ans = 0.0034
function counterTest(counter)
    for idx = 1:1000
        counter = counter-1;
    end
end

4 个评论
显示 2更早的评论隐藏 2更早的评论

Chris 2022-6-17

编辑：Chris 2022-6-18

在 MATLAB Online 中打开

Expanding a little on that test, I get 0.10 seconds on my computer with this:

number = int64(randi(999999,10000,1));
tic
for z = 1:10000
    number = number-1;
end
toc

Compared to what I think is pretty equivalent C++ code (0.22 seconds)

Some other points of interest besides LAPACK:

Using Matlab double arrays (the default) is twice as fast as int64/uint64.
Calling counter.size() for your loop iterator appears to almost double the run time for me.
Using a standard C array appears to be faster than using a vector.

#include <iostream>
#include <random>
#include <chrono>
int64_t number[10000];
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_int_distribution<int64_t> dist(0, 999999);
// tic/toc for g++
void tic(int mode=0) {
    static std::chrono::_V2::system_clock::time_point t_start;
    
    if (mode==0)
        t_start = std::chrono::high_resolution_clock::now();
    else {
        auto t_end = std::chrono::high_resolution_clock::now();
        std::cout << "Elapsed time is " << (t_end-t_start).count()*1E-9 << " seconds\n";
    }
}
void toc() { tic(1); }
int main() 
{
    
    for (int j=0; j<10000; j++)
    {
    	number[j] = dist(gen);
    }
    tic(); 
    for (int z = 0; z < 10000; z++)
    {
    	for (int j = 0; j < 10000; j++)
    	{
    		number[j] = number[j] - 1;
    	}    	
    }
    toc();
    return 0;
}

Chris 2022-6-19

Interesting. You're saying 32-bit integer operations should require equal time to 64-bit integers using BLAS functions?

It appears to me that the PetscBLASInt type allows BLAS to handle both data types separately.

https://petsc.org/release/docs/manualpages/Sys/PetscBLASInt.html

Thomas 2022-6-19

This makes sense - thank you very much for your help! I will try the timing thing, you suggested.

请先登录，再进行评论。

Answer 2

Jan 2022-6-17

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1742610-why-is-this-matlab-code-faster-than-the-c-code-below-i-want-to-understand-what-matlab-internally#answer_987885

编辑：Jan 2022-6-19

在 MATLAB Online 中打开

Not an answer, but an improvement of the Matlab code, which run in 10.7 sec on my R2018b i5m instead of 13.0 sec of the original version for N=2e5:

function primes = sieve_era2m(N)
% sieve of Erathostenes without upper bound of search space (could theoretically run forever)
if nargin == 0
   N = Inf;
end
number(1)  = 2;
counter(1) = number(1);
k1 = 2;
show = 1e5;
tic;
for k = 3:N+1
   if k == show
      fprintf('checked: %i, primes found: %i, largest: %i, time: %.2f s\n', ...
         k, k1, number(end), toc);
      show = show + 1e5;
   end
   counter  = counter - 1;        % all counters reduced by 1
   if all(counter)               % current numer is not a prime
      number(k1) = k;
      counter(k1) = number(k1);
      k1 = k1 + 1;               % no counter was reduced to zero --> current number is a new prime
   else
      ncounter          = ~counter;
      counter(ncounter) = number(ncounter);
   end
end
primes.number  = number;
primes.counter = counter;
end

And with UINT32 and without output it runs in 7.7 sec:

function primes = sieve_era2i(N)
number(1)  = uint32(2);
counter(1) = uint32(2);
one        = uint32(1);
tic;
k1 = uint32(1);
for k = uint32(3):uint32(N)
   counter = counter - one;
   if all(counter)
      k1          = k1 + one;
      number(k1)  = k;
      counter(k1) = k;
   else
      for u = one:k1
         if ~counter(u)
            counter(u) = number(u);
         end
      end
   end
end
primes.number  = number;
primes.counter = counter;
end

EDITED: And a version taking 6.8 sec:

function primes = sieve_era2j(N)
piN     = ceil(N / log(N));
number  = zeros(1, piN, 'uint32');  % Pre-allocation
counter = zeros(1, piN, 'uint32');  % Pre-allocation
number(1)  = uint32(2);
counter(1) = uint32(2);
one        = uint32(1);
tic;
k1 = uint32(1);
for k = uint32(3):uint32(N)
   new = 1;
   for u = one:k1
      counter(u) = counter(u) - one;
      if counter(u) == 0
         counter(u) = number(u);
         new = 0;
         break;
      end
   end
   
   for v = u+1:k1  % Count the rest without setting [new] again
      counter(v) = counter(v) - one;
      if counter(v) == 0
         counter(v) = number(v);
      end
   end
   
   if new  % New prime found:
      k1          = k1 + one;
      number(k1)  = k;
      counter(k1) = k;
   end
end
primes.number  = number(one:k1);
primes.counter = counter(one:k1);
end

Now to an answer: I cannot profile your C++ code. My guess is that this is the bottleneck:

p.number.push_back(k);
p.counter.push_back(p.number.back());

The iterative growing of arrays is expensive. It looks like Matlab strategies to reduce the effect is more powerful.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Thomas 2022-6-19

many good ideas - thank you very much!

请先登录，再进行评论。

why is this Matlab Code faster than the C++ code below? I want to understand what Matlab internally does better and faster than C++

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论
显示 2更早的评论隐藏 2更早的评论

更多回答（1 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

why is this Matlab Code faster than the C++ code below? I want to understand what Matlab internally does better and faster than C++

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

4 个评论 显示 2更早的评论隐藏 2更早的评论

更多回答（1 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

4 个评论
显示 2更早的评论隐藏 2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论