How to correct for CPU clock speed changes when doing code profiling / optimization
14 次查看(过去 30 天)
显示 更早的评论
A while ago, I needed to optimize some code to gain speed. It was a relatively large collection of functions and I could only gain few % speed from any small part. This was OK, but the problem was that often, the timings measured were highly variable / non-reproducible. So one minute, (using e.g. tic/toc) one version of an algorthim was faster than the other, the next minute it was the reverse. I ofcourse always make sure to run multiple times (often upto several minutes execution time) for each tic/toc pair, with the aim of getting consistent results.
Then I discovered that my CPU (Intel core i9, Bacbook 2019, MacOS) was all over the place with the clock speed. As soon as I ran the tests, the clock-speed, first went to turbo-boost mode at around 3.8 GHz, then when CPU temperature increased, clock speed decreased - sometimes all the way to 1.7 GHz (I guess by Speed-Step ?). Have a look at this example:
I searched extensively for a solution but did not come across one. So I came up with the below functions to correct for CPU clock speed variation during testing.
The method is far from prefect, and so my questions are: do other people have the problem described above, and how do you correct for it ? Also, do you find the method below viable or do you think the results are misleading ? and any suggestions for improvement ?
PS: I monitored my clock speed using Turbo Boost Switcher. The program also alows to prevent Turbo Boost mode, which partly ameliorates the above problem, but does not eliminate it.
PPS: The calls to "system" in the below work on MacOS Ventura. Something similar could be done for Windows. Before running the functions below, you need to issue some command to terminal from MatLab using "sudo" (does not matter which command), just to invoke the login prompt and enter your password (if not doing so, the functions will just hang).
ST = MyTic;
%code to be tested for speed here
R = MyToc(ST);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function R = MyTic
warning("run command: system('sudo ping'). This just to invoke the login prompt")
[status, result] = system('sudo powermetrics -n 1 -i 10 | grep "System Average"'); %taken from GitHub code of Turbo Bost Switch
R.CPUFreqStart = str2double(result(end-12:end-6)); % in MHz
R.CPUFracStart = str2double(result(end-21:end-16))/100; % in percentages of nominal frequency
R.WallTimeStart = tic;
R.CPUTimeStart = cputime; %note this is adjusted for number of threads (but not number of processes) when running parpool
end
function R = MyToc(StartVals)
[status, result] = system('sudo powermetrics -n 1 -i 10 | grep "System Average"'); %taken from GitHub code of Turbo Bost Switch
CPUFreqNow = str2double(result(end-12:end-6)); % in MHz
CPUFracNow = str2double(result(end-21:end-16)); % in percentages of nominal frequency
R.CurrentCPUFrac = CPUFracNow/100;
R.meanCPUFrac = (StartVals.CPUFracStart + R.CurrentCPUFrac )/2;
R.meanCPUFreq = (StartVals.CPUFreqStart + CPUFreqNow )/2;
R.WallTime = toc(StartVals.WallTimeStart);
R.CPUTime= cputime-StartVals.CPUTimeStart; %note this is adjusted for number of threads (but not number of processes) when running parpool
R.normWallTime = R.WallTime*R.meanCPUFrac;
R.normCPUTime = R.CPUTime*R.meanCPUFrac;
end
2 个评论
Dyuman Joshi
2023-12-11
Output from single runs of tic-toc will vary drastically. If you want to use tic-toc, run your function multiple times and then get the average time by dividing the time by total number of iterations.
采纳的回答
Abhishek Kumar Singh
2023-12-18
编辑:Abhishek Kumar Singh
2023-12-28
Hi Rozh,
Corrections for CPU clock speed changes when doing code profiling and optimization is inherently challenging both in theory and in practice. Needless to say, the approach you took to account for CPU performance fluctuations is commendable.
Modern processors do not execute instructions sequentially due to Out-of-order execution to optimize for pipeline usage. Also, it's difficult to control power management and thermal throttling from application level which results in non-linear performance scaling.
Also, there are practical challenges like the 'system' calls adding overhead which can distort the timing of the code being profiled.
I would suggest the following approaches in this scenario:
- Longer running code: Design your benchmarks to run for a more extended period. This would comparatively reduce the impact of transient CPU state changes.
- Profiler Usage: Utilize MATLAB's built-in profiler to identify bottlenecks and optimize the code. Read more about the profiler at: Profile Your Code to Improve Performance - MATLAB & Simulink - MathWorks
- Control CPU State: If possible, disable dynamic frequency scaling features like Turbo Boost during benchmarking to achieve a consistent CPU clock speed. This can often be done through system BIOS settings or with third-party tools.
- As suggested by Dyuman, consider multiple runs and averaging the results:
num_runs = 100;
timings = zeros(1, num_runs);
for i = 1:num_runs
tic;
your_function(); % Replace 'your_function' with the function you want to time
timings(i) = toc;
end
average_time = mean(timings);
Largely your solution should work and combining it with the suggestions provided, you can further enhance the accuracy and reliability of your optimization efforts.
I hope it helps!
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Genetic Algorithm 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!