how to determine the arithematic operations in a code?

Question

Tony Cheng 2022-4-17

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1698575-how-to-determine-the-arithematic-operations-in-a-code

编辑： Bruno Luong 2022-4-18

采纳的回答： John D'Errico

Hi there,

I want to know: are there any commands in Matlab that can offer us the number of arithematic operations in a code?

many thanks!

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

John D'Errico 2022-4-17

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1698575-how-to-determine-the-arithematic-operations-in-a-code#answer_944720

编辑：John D'Errico 2022-4-17

Sorry, but not easily. Flop counts were removed from MATLAB over 20 years ago.

https://www.mathworks.com/matlabcentral/answers/812750-how-to-use-flops-in-r2021a

Unfortunately, the flops counting tool was from the dark ages, when computers were far simpler things.

Can you do an approximate flop count? Possibly. Many operations have known theoretical complexity. Whether the tools linked in that link above still work, I cannot tell.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Walter Roberson 2022-4-18

在 MATLAB Online 中打开

@Tony Cheng

Until about 1994, CPUs executed all instructions in order, either finishing one instruction before starting the next, or at the very least not holding off parts of the next instruction until it could be known for sure that it did not rely on the previous instruction's results (and the results of the next instruction were not finalized out of microcode until the previous instruction had been finalized.)

This is not the same topic as multiprocessing: multiprocessing has completely independent CPUs running at the same time, together with protocols for resolving which CPU had control of resources (such as memory addresses).

Around 1994, SGI introduced the MIPS R8000 chip https://en.wikipedia.org/wiki/R8000 which was interesting because it had multiple logic units within the same code, with internal pipelines and the ability to calculate multiple items at the same time, along with "speculative execution".

Speculative execution is a technique where a CPU sees that there is a conditional branch, and that the next instruction in both cases does not require outputs of the comparison, and the processor goes ahead and calculates both results at the same time while it waits for the comparison to finish and for the branch logic to decide which branch to take -- and once the branch logic decides, the speculative execution throws away the results of the branch not taken.

Consider, for example, code that looked something like

if task_code == 1783
   supervisor_overrides = supervisior_overrides + 1;
else
   non_overridden_total = non_overridden_total + this_cost;
end

while the branch logic is busy calculating whether task_code == 1783 or not, the CPU goes ahead and starts calculating both supervisior_overrides + 1 and non_overridden_total + this_cost, and only finalizes one of the two according to what the branch logic eventually says.

In a case like this, where a calculation could be happening at the same time as another calculation, how do you calculate the floating point operation rate? Do you calculate only the graduated operations? Do you count the speculative ones that ended up getting discarded?

The R8000 had multiple integer logic units, and multiple floating point logic units. If the instruction sequence is such that the CPU can look ahead and start an integer instruction and a double precision instruction at the same time, and then when the integer instruction completes, start a second simultaneous double precision instruction, and then when the first double precision instruction finishes, starts four simultaneous integer instructions... how do you count flops for that? How do you count SIMD (Single Instruction Multiple Data) ?

With the R8000, the internal pipelines could have on the order of 8 instructions in the queue that the results had been calculated for but which had not yet been finalized ("graduated"). One of the slowest paths was a double precision division. The code might look like

P = A ./ B;
T1 = T1 + 1;
T2 = T1 * 4;
T3 = C(T2);

A human looking at that might say "T1, T2, T3 depend on each other, but they do not depend upon the division, so you should be able to re-arrange those so you might start the division first, but the effect would be as-if the division had been written last in the code." The problem with that is that if B happens to be 0 then the processor might be configured to interrupt on division by 0 -- and if you interrupt then the T1, T2, T3 results must not have been finalized ! So the processor might go ahead and calculate those values but not flush the results out of microcode into registers until the division operator indicated everything was fine... you cannot consider the results of the T calculations to have been finished until the division completes, but as soon as the division does complete, then before the next bus cycle even starts, the floating point operations can be considered to have completed. There is a sense in which the several floating point operations completed in less than one internal bus cycle. How do you count the flop rate for that ??

Bruno Luong 2022-4-18

编辑：Bruno Luong 2022-4-18

flop rate is flop rate it's just a measure to quantify the efficiency of an algorithm, at least in term of energy to carrying a task (thus the CO2 footprint to be inline with humanity current preoccupation). People still speak about O() notation isn't it? It's till very fundamental and useful metric.

Whereas it is directly proportional to execution time on parallel architecture is entirely another question.

请先登录，再进行评论。