Hi I created an image processing function and transformed it into .mex to speed-up my process. As I have an Nvidia GPU cuda 6.1 compute capability, I thought to create .mex with GPU Coder also. But why output of both .mex file is not exactly same? I passed same input image and other parameters, but I find ouput image of GPU coded mex little more blurry. See below comarison carefully.. > Can somebody explain why? I used below command to create GPU coded mex cfg = coder.gpuConfig('mex'); cfg.GpuConfig.CompilerFlags = '--fmad=false'; cfg.GpuConfig.ComputeCapability = '6.1'; codegen -args {inputImage, otherParameters} -config cfg imProcessFunction For detailed analysis, I subtracted above images and then I got below image difference = CPUimage-GPUimage; >

Why using .mex created by MATLAB Coder and GPU Coder doesn't give e...

Walter Roberson 2018-8-24

How is the visual performance of using the GPU more interactively? In other words we need to isolate whether this is due to using the gpu at all, or due to mexing the GPU.

Differences in gpu results are generally expected due to different order of operations.

dpb 2018-8-24

NVIDIA GPU hardware FP isn't identical to X_86, necessarily, is it? There's also rounding modes of Intel coprocessor; don't have any idea what TMW uses but I don't think it's at all surprising the two aren't identical.

There may be things you can do with compiler switches to make more nearly like each other but don't think there's any guarantee can be the same. transcendentals can be a factor if there's an of those in the algorithm.

Walter Roberson 2018-8-24

编辑：dpb 2018-8-24

https://docs.nvidia.com/cuda/floating-point/index.html

JAI PRAKASH 2018-8-24

@ Walter I didn't understand your 1st comment. I mean, what can I do?

But let me go through link in your 2nd comment..

dpb 2018-8-24

编辑：dpb 2018-8-24

My quick search didn't uncover the link Walter shows and while a detailed read will provide a lot of info, very quickly found two points...

"4.5. Differences from x86 NVIDIA GPUs differ from the x86 architecture in that rounding modes are encoded within each floating point instruction instead of dynamically using a floating point control word. Trap handlers for floating point exceptions are not supported. On the GPU there is no status flag to indicate when calculations have overflowed, underflowed, or have involved inexact arithmetic."

Simply different compilers can use different optimization levels or build different execution chain for the same calculation so that order isn't necessarily the same and floating point isn't necessarily exactly commutative so those effects can show up.

"5.1. Mathematical Function Accuracy ... The consequence is that different math libraries cannot be expected to compute exactly the same result for a given input. This applies to GPU programming as well. Functions compiled for the GPU will use the NVIDIA CUDA math library implementation while functions compiled for the CPU will use the host compiler math library implementation (e.g., glibc on Linux). Because these implementations are independent and neither is guaranteed to be correctly rounded, the results will often differ slightly."

As noted, you may be able to do something with the compiler to try to make rounding more nearly the same (providing you can determine what mode TMW is using) but there's probably nothing you can do about any differences in the libraries.

What Walter was suggesting in his first comment was to try to duplicate the GPU code interactively instead of compiled but I don't think that will likely help because the GPU instructions have to run in that environment so whatever is different is well, "just different".

You could look at pieces of the algorithm perhaps and try to isolate particular calculations and maybe eventually isolate which part is the culprit but I would not hold out much hope for "fixing" it.

JAI PRAKASH 2018-8-24

编辑：JAI PRAKASH 2018-8-25

Thanks @dpb, for your consideration and crafted summary.

What and how can I do something with compiler?

And what is TMW?

dpb 2018-8-25

编辑：dpb 2018-8-25

TMW --> The Mathworks, publishers of Matlab

As for the compilers, you'd have to study documentation for them regarding whatever options they have; I have neither product so have no knowledge at all of either. And, they're so paranoid they won't even let me read the documentation without a license so I can't go looking to see what can see, sorry... :(

JAI PRAKASH 2018-8-25

@dpb

which documentation exactly you are referring to.

May be I can attach a copy here..

Walter Roberson 2018-8-25

https://www.mathworks.com/matlabcentral/answers/1032-why-do-some-calculations-like-the-fft-produce-different-results-when-performed-on-a-gpu#answer_1518

https://www.mathworks.com/matlabcentral/answers/389548-complex-numbers-in-ifft-fft-x#answer_311228

https://www.mathworks.com/matlabcentral/answers/408536-default-floating-point-precision-with-mexcuda#answer_327366

Walter Roberson 2018-8-25

Documentation:

https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html

JAI PRAKASH 2018-8-25

Thanks @Walter Robreson

Above links were helpful..

Why using .mex created by MATLAB Coder and GPU Coder doesn't give exactly same results?

11 个评论
显示 9更早的评论隐藏 9更早的评论

回答（0 个）

类别

产品

版本

标签

Community Treasure Hunt

Why using .mex created by MATLAB Coder and GPU Coder doesn't give exactly same results?

11 个评论 显示 9更早的评论 隐藏 9更早的评论

回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

11 个评论
显示 9更早的评论隐藏 9更早的评论