How to get the calculation amount of deep network FLOPS? Analyze Network app does not seem to count this metric？

Question

xingxingcui 2019-8-9

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/475571-how-to-get-the-calculation-amount-of-deep-network-flops-analyze-network-app-does-not-seem-to-count

评论： xingxingcui 2021-9-24

In the matlab analyzeNetwork app, the general CNN model can have the required number of parameters, the size of the feature map, but no flops?...

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

xingxingcui 2021-9-24

Still not supported in version R2021b！

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Walter Roberson 2021-9-24

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/475571-how-to-get-the-calculation-amount-of-deep-network-flops-analyze-network-app-does-not-seem-to-count#answer_794014

This is quite unlikely to happen in the near future, if ever.

The translation of cuda calls into machine instructions depends on the level of optimization, and the ability of the compiler, and the cuda version. The translation of machine instructions into gflops depends on the other instructions scheduled and on the exact model — because even within one architecture, they put out models with different numbers of controllers (SMs) and very different implementations of double precision. The models with the highest double precision performance are never the models with the highest single precision, and it is not uncommon for the model from the previous architecture that had the highest double precision, to have higher double precision than most of the models with the new architecture.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Walter Roberson 2021-9-24

If not predict, then can matlab measure gflops? That clearly depends upon what tools Nvidia provides.

What Nvidia provides is counters of a series of different classes of instructions. Nvidia also provides a performance graph based upon assigning a weight to each of the classes of instructions. The person running the tool can configure the weights.

But... the weights they use do not correspond to any actual model. And all the instructions in the same class are given the same weight, even though the different instructions may have different graduation rates. That is, some of the instructions are limited as to the number that may be executed simultaneously, at rates much lower than using the number of clock cycles per instruction would expect. The handling of square root and reciprocal square root is especially odd, due to some work needed to handle 0 and infinity according to ieee standards.

So... you cannot convert between the counters and gflops without knowing which instructions were being executed because members of the classes can have quite different performance.

The architecture for the 3000 series has some interesting changes for integer work that has to be taken into consideration when measuring gflops.

Remember though that gflops has to do with FLOATING point operations per second, but models might be programmed in integer. If a model is mostly integer, should the gflops measure be near zero, since few floating point operations were done?

xingxingcui 2021-9-24

Thanks for the detailed analysis！

请先登录，再进行评论。

Answer 2

Shuyue JIA 2021-7-5

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/475571-how-to-get-the-calculation-amount-of-deep-network-flops-analyze-network-app-does-not-seem-to-count#answer_740183

编辑：Shuyue JIA 2021-7-5

Have you found a solution?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

xingxingcui 2021-7-7

I have done a simple experiment as follows

请先登录，再进行评论。

Answer 3

xingxingcui 2021-7-7

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/475571-how-to-get-the-calculation-amount-of-deep-network-flops-analyze-network-app-does-not-seem-to-count#answer_741363

编辑：xingxingcui 2021-7-30

在 MATLAB Online 中打开

gpuDevice

Error using gpuDevice (line 26)
Failed to load graphics driver. Unable to load library 'libcuda.so.1'. The error was:
libcuda.so.1: cannot open shared object file: No such file or directory
Update or reinstall your graphics driver. For more information on GPU support, see GPU Support by Release.

%% MATLAB R2021a

net50 = resnet50;

h=224;

w = 224;

layer = 'fc1000';

%% evaluate

X = gpuArray(rand(h,w,3));

features = activations(net50,X,layer);

dev = gpuDevice(1);

for i = 1:100

scalar = i;

X = gpuArray(rand(h*scalar,w*scalar,3));

% X = dlarray(X);

try % Out of memory on device.

t1 = tic;

features = activations(net50,X,layer,...

'Acceleration','none',...

'ExecutionEnvironment','gpu');

[H,W,C] = size(X);

ElapseTime(i) = toc(t1);

avaiableMem(i) = dev.AvailableMemory/(1024^2);

sizeInput(i) = H;

fprintf('input size:(%i*%i),耗时：%.2f秒,可用显存大小为：%g Mb\n',...

H,W,ElapseTime(i),avaiableMem(i));

catch

break

end

%% plot

figure('Color','white');

yyaxis left;

plot(sizeInput,ElapseTime,'-o','LineWidth',2);

xlabel('input image size')

ylabel('ElapseTime(s)')

yyaxis right;

plot(sizeInput,avaiableMem,'-o','LineWidth',2);

ylabel('Avaiable Memory(MB)')

grid on;

title('Indirect Evaluation of DeepNetwork computational power and number of parameters ')

FLOPs and #params correspond to ElapseTime, Avaiable Memory respectively.

The answer can be seen indirectly in this diagram.

run in MATLAB 2021a, win10

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to get the calculation amount of deep network FLOPS? Analyze Network app does not seem to count this metric？

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（2 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to get the calculation amount of deep network FLOPS? Analyze Network app does not seem to count this metric？

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（2 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论