Random Numbers Generation - For Loop VS 3D Matrix
2 次查看(过去 30 天)
显示 更早的评论
Hello, I have a question regarding MATLAB's performance while generating random numbers. This is a pretty straightforward situation, I have two versions of the same code, one generates random number vectors in a for loop and saves them in a 3D matrix, and the other version generates the random numbers all at once in a 3D Matrix using simply rand() function. It just so happens that the "for loop" version is faster than the "no for loop" version after a certain size of the random 3D matrix. I have discussed this with my peers and no one seems to understand why this is. Does anyone know why this happens? Thanks!
PS: This code was tried on different machines running Windows and macOS, enough RAM was avaliable and capable processors were used.
Code:
clear all
m = 10^4;
N = 10^3;
tic
r = zeros(N,3,m);
for n = 1:m
r(:,1:2,n) = rand([N,2]) - 1/2;
r(:,3,n) = rand([N,1]) - 1;
end
toc
clear r
tic
r = rand(N,3,m) - 1;
r(:,1:2,:) = r(:,1:2,:) + 1/2;
toc
0 个评论
采纳的回答
John D'Errico
2020-5-9
编辑:John D'Errico
2020-5-9
It really is impossible to know something like this, unless you have the code sitting there in front of you. And that is something we do not have. Also, tic and toc are generally a bad way to time things. They tend to not use JIT as well, since they are used in a script. And they don't warm thing up properly. And since you should learn to use functions anyway, use timeit. Anyway, let me see what I can learn here. First, reconstitute things into functions. Use of functions can improve how JIT works, at least that was true in the past. Not so true today, but it also helps us to use the profiler.
function r = test1(m,N)
r = zeros(N,3,m);
for n = 1:m
r(:,1:2,n) = rand([N,2]) - 1/2;
r(:,3,n) = rand([N,1]) - 1;
end
function r = test2(m,N)
r = rand(N,3,m) - 1;
r(:,1:2,:) = r(:,1:2,:) + 1/2;
Having stuffed the code into a function, now call timeit.
timeit(@() test1(1e4,1e3))
ans =
0.2452168234265
timeit(@() test2(1e4,1e3))
ans =
0.3289476464265
It too shows the same behavior. (Note that I used scientific notation there to define m and N. Easier to write, and as you write more MATLAB, it is easier to do.)
So, next, run the profile tool. USE THE TOOLS OF MATLAB. This is the only way we can learn what is happening. When I do that, I can look into the code for test1 and for test2, to learn where MATLAB seems to be taking more time. For multiple calls to test1 (remember that timeit is looping over calls to the function to get an average time used) test1 had a total time of 3.532 seconds.
2.297 seconds: r(:,1:2,n) = rand([N,2]) - 1/2;
1.226 seconds: r(:,3,n) = rand([N,1]) - 1;
Everything else in test1 was negligable. There were 150000 occurrences booked for each of those lines. But they were all small batches of random numbers in each case.
Next, we look at test2. It was called only 15 times.
3.234 seconds: r = rand(N,3,m) - 1;
1.515 seconds: r(:,1:2,:) = r(:,1:2,:) + 1/2;
However, each call was more substantial. As you can see there, the big calls do indeed seem to be less efficient. But realistically, the calls in test2 ARE inefficient. Why? They are inefficient because you computed a set of random numbers, a large array, and then subtracted 1 from ALL of them. But then for SOME of them, you added 1/2 back in.
Effectively, you did an extra set of adds on some of the numbers, when there was no need at all to do TWO additions on the same numbers.
Can we be more efficient ourselves? For example, consider test3.
function r = test3(m,N)
r = rand(N,3,m) - [1/2 1/2 1]; % Fixed per Ameer's comment
Testing that code, we see:
timeit(@() test3(1e4,1e3))
ans =
0.2259027794265
In test3, I used an implicit expansion of the vector [1 1 1/2] to accomplish the desired operation in one simpler call. Now only one add per element is done. Again there were only 15 calls to rand, but here we needed to do only ONE arithmetic operation per element, therefore it took less time. And that ONE call to rand with ONE add per element is faster than even the JIT accelerated loop. Here the profiler tells me, for again 15 calls to test3:
3.345 seconds: r = rand(N,3,m) - [1 1 1/2];
Using those calls, it determines the time per call was 0.2259 seconds per call. (As I crecall, timeit throws out the first few calls, then computes a median time estimate of those that remain. This allows timeit to be more robust. than just tic/toc.)
The point is, learn to use the profiler in MATLAB. It can tell you where there are problems, but also tell you how to improve them. Here we needed to recognize the real problem was not in terms of an inherant inefficiency in RAND when called for large arrays of numbers, but in performing more additions than you needed to do.
4 个评论
John D'Errico
2020-5-9
编辑:John D'Errico
2020-5-9
I did not see it at first either, so it was not obvious. In fact, it was not until I had written almost all of my own answer until I noticed the subtle double addition in there. I've fixed the code for test3 so that if someone else sees this answer in the future they will not be confused.
更多回答(1 个)
Ameer Hamza
2020-5-6
编辑:Ameer Hamza
2020-5-9
It not unheard of and definitely not surprising. Thanks to JIT, for loops are not as slow as one might thinks they are. I guess that rand() might be creating some temporary arrays, since memory operations are expensive, which might explain the extra time. Or it might have some extra logic to maintain the uniformity of random numbers. However, the exact reason is difficult to pinpoint since the source code for rand() is not available. Read here for examples
Read about JIT here
2 个评论
John D'Errico
2020-5-9
编辑:John D'Errico
2020-5-9
+1. I've added an answer of my own only because the use of the profiler is an important piece of information, and to explain the use of timeit, and finally, to show that there is a third way to write this, that does in fact improve the time required below that of the loops.
Since my answer explains that really, the problem is not in the one large call to rand, but in the fact that there were multiple additions done when only one addition needed to be done for each array element. The looped version did only one add per element, but then so does my test3 code, and it was the most efficient version of all.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Performance and Memory 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!