How to speed up our code to be implemented on GPU

Question

moh mor 2024-7-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2133936-how-to-speed-up-our-code-to-be-implemented-on-gpu

评论： Chao Luo 2024-7-10

Hello, I have previously created my MEX file of my code to speed up its implementation speed on GPU. Fortunately, it got faster by 5 times, and hopefully, I want to know if there is way to implement it with higher speed. Here is my code:

function BPmimo2C(Efield) %#codegen
coder.gpu.kernelfun;
image = complex(zeros(17,54,54));
%% creating kaiser window
numT = 16;
numR= 16;
f = 10e9:0.5e9:20e9;
numF = numel(f);
w = ones(numel(f),1);
viq = repmat(w.', [1,numT*numR]);
c = physconst('LightSpeed');
%% grid points
xf = (-8:0.3:8)*0.01;
yf = (-8:0.3:8)*0.01;
[uf , vf] = meshgrid(xf,yf);
x1f = uf(:);
y1f = vf(:);
%% initialization 
ArrRadius = 30;
TX = [ArrRadius.*cosd((360/15)*(0:14))*0.01 0];
TY = [ArrRadius.*sind((360/15)*(0:14))*0.01 0];
K = 2*pi*f/c;
z = 0.36:0.003:0.41;
% z = 0.4;
for dep = 1:numel(z)
    
    %% making the matrix of <transmitter-grid point> distance
    XYPos = [TX.' TY.' ones(size(TX,2),1)*(z(dep))];
    UVPos = [x1f(:), y1f(:), zeros(size(y1f(:),1),1)];
    dtXYUV = pdist2( XYPos, UVPos);
    dtXYUV2 = zeros(numR,numel(x1f(:)));
    expTerm1 = bsxfun(@times,dtXYUV(:)' , K');
    expT1 = reshape(expTerm1,[numel(K),numel(TX),numel(x1f)]);
    expT2 = zeros(numel(K),numR,numel(x1f),numel(TX));
    for i = 1:numel(TX)
        expT2(:,:,:,i) = repmat(expT1(:,i,:),[1 numR 1]);
        dtXYUV2(:,:,i) = repmat(dtXYUV(i,:),[numR,1]);
        
    end
    expT = permute(reshape(permute(expT2,[1 3 2 4]),[numel(K),numel(x1f),numR*numel(TX)]),[1 3 2]);
     
    
    %% making the matrix of <reciever-grid point> distance
    XYPos = [real(Efield(1:numR,2,1)) , real(Efield(1:numR,3,1)), ones(numR,1)*(z(dep))];
    UVPos = [x1f(:), y1f(:), zeros(size(y1f(:),1),1)];
    dXYUV = pdist2( XYPos, UVPos);
    expTerm1 = bsxfun(@times,dXYUV(:)' , K');
    expR = repmat(reshape(expTerm1,[numel(K),numR,numel(x1f)]),[1 numel(TX) 1]);
    
    
    %% making the exponentail term 
    EXP = exp(1i*(expT + expR));
    EXP2 = reshape(EXP,[numel(K)*numel(TX)*numR,numel(x1f)]);
    Efield2 = reshape(permute(Efield(1:numT*numR,:,:),[3 1 2]),[numel(f)*numT*numR,6]);
    image2 = reshape(((viq.').*Efield2(:,6)).'*EXP2,[sqrt(numel(x1f)),sqrt(numel(x1f))]);
    %% gahter to change matrix from GPU-array to normal array
    image(dep,:,:) = image2;
   
end
image = abs(image);
uf = repmat(reshape(uf,[1,numel(xf),numel(yf)]),[numel(z) 1 1]);
vf = repmat(reshape(vf,[1,numel(xf),numel(yf)]),[numel(z) 1 1]);
hf = uf;
for j = 1:numel(z)
    hf(j,:,:) = z(j);
end
figure(1);
er = squeeze((image(13,:,:)));
h = surf(squeeze(uf(1,:,:)),squeeze(vf(1,:,:)),er);
colormap(jet);
set(h,'LineStyle','none');
view(2);
end

In addition to speed, sometimes it encounters with "out of memory" error, which is due to huge size of some arrays. I can implement it using multiple nested "for"loops, however, I understood it'd be faster on CPU if I use MATLAB's matrix multipication capability; Therefore, I preferred matrix-based code rather than multiple nested "for" loops.

Any advice, whether it would be general or specific, would be appreciated.

Thank you

2 个评论
显示无隐藏无

Joss Knight 2024-7-8

Can I just check that you are aware that you do not need to use Code Generation to accelerate your code on GPU? You only need to adapt your code to use gpuArray data. GPU Coder can be useful for converting code that must be written as a loop; but if you can vectorize your loops and make them matrix, vector or pagewise operations instead, you could get better performance without needing to use coder instrinsics or configure a compiler.

moh mor 2024-7-9

Thank you @Joss Knight ,

Actually I implemented this computation using code of one of my friends in python. I had to first import "dll" file of my function and then install cuda and gcc on my computer. its speed was so much better than mine. Furtheremore, it did not have any "out of memory" problem, while I can not increase the size of my array whatever I want. I'm trying to overcome this problem in my code. Previously I implemented my arrays using gpuArray and I understood the increased speed of my function. But I think it is not enough.

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Umar 2024-7-3

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2133936-how-to-speed-up-our-code-to-be-implemented-on-gpu#answer_1480246

Hi Moh,

Please see my suggestions below to help you out. I did analyze your code to identify any potential bottlenecks or areas for optimization.

Your code initializes a complex image array image with dimensions 17x54x54. This array is used to store the results of the calculations. A Kaiser window is created using the w array. Grid points xf and yf are defined using a range and step size. The code initializes variables and arrays for further calculations. A loop is used to iterate over different values of z. Within the loop, the code calculates the distance between transmitters and grid points (dtXYUV) and stores it in dtXYUV2. The code then calculates the exponential term expT using the distance and wave number. Next, the code calculates the distance between receivers and grid points (dXYUV) and stores it in expR. The exponential terms expT and expR are combined to calculate the overall exponential term EXP. The code reshapes and rearranges the arrays to perform matrix multiplication and obtain the final image. The image is stored in the image array. The code repeats steps 5-11 for different values of z. The final image is obtained by taking the absolute value of the image array. The code plots the image using the surf function.

Now, to optimize the code for speed, there are several key suggestions to consider. One important strategy is to preallocate arrays with the correct dimensions instead of initializing them with zeros. This can help avoid the need for resizing the array during loop iterations, which can slow down the code. Another useful tip is to vectorize calculations whenever possible. By using MATLAB's matrix multiplication capability, you can perform calculations more efficiently and avoid the need for loops. This can significantly improve the speed of your code. It's also important to analyze your code and identify any redundant calculations or unnecessary operations that can be eliminated. By streamlining your code in this way, you can make it more efficient and faster. Additionally, if your system has multiple CPU cores, consider utilizing MATLAB's parallel computing capabilities to distribute the workload and speed up calculations. This can help take advantage of the processing power available and further optimize your code for speed.

In terms of memory management, reducing array sizes where possible can help address "out of memory" errors. Adjusting step sizes or grid point ranges can help minimize memory usage and prevent these errors from occurring. Using data types with smaller memory footprints, such as single precision instead of double precision, can also help conserve memory. If memory limitations are still a concern, consider splitting calculations into smaller chunks and processing them sequentially to avoid exceeding available memory.

By implementing these optimizations and memory management techniques, you can improve both the speed and memory usage of your code significantly.

2 个评论
显示无隐藏无

moh mor 2024-7-6

在 MATLAB Online 中打开

Hi @Umar,

I did what you mentioned to increase computation speed on my system. Moreover, I bring up a little minor changes in my code to make it dynamic. For example, I have given "start", "stop", and "step" for grid point coordinates in "test" function and passed it through to "BPmimo2C" function. As much as I could, I have performed computation in "single precision" format. All function are GPU compatible. The problem is that, for example, computation speed for normal run was 38s while it was 26s for mex file. I expect more to see from my implementation.

This is the test file:

clear all
load Efield
%% grid points
xf_str = -8;  % up to three decimal points
xf_end = 8; % up to three decimal points
yf_str = -8; % up to three decimal points
yf_end = 8; % up to three decimal points
xy_step = 0.2; % up to three decimal points
z_str = 0.398; % up to three decimal points
z_end = 0.401;% up to three decimal points
z_step = 0.001; % up to three decimal points
xyz_par = [xf_str, xf_end, xy_step, yf_str, yf_end, xy_step, z_str, z_end, z_step]*1000;
f_str = 10e9;
f_end = 20e9;
f_step = 0.2e9;
f_par = [f_str, f_end, f_step];
ArrRadius = 15;
TX = [ArrRadius.*cosd((360/15)*(0:14))*0.01 0];
TY = [ArrRadius.*sind((360/15)*(0:14))*0.01 0];
tic
BPmimo2C_mex( Efield, f_par, xyz_par, TX, TY, numT, numR)
toc

I don't know if this'll help, but Here are the warnings appeard in MATLAB after compilation:

Thank you for your help.

Umar 2024-7-6

移动：Walter Roberson 2024-7-8

Hi Moh Mor,

Have you considered reaching out to MathWorks support for further assistance. Provide them with detailed information about your system configuration, MATLAB version, and the steps leading to the internal error.

请先登录，再进行评论。

Answer 2

Chao Luo 2024-7-3

2
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2133936-how-to-speed-up-our-code-to-be-implemented-on-gpu#answer_1480786

在 MATLAB Online 中打开

The generated code is quite optimized for GPU. I tried rewriting the code using explicit for-loops which results in similar performance. On top of that, I converted the data type from double to single, which speeds up the execution about 10 times. Do the conversion If signle precision is good enough for you. Here is the code I rewrite with the ploting part removed for your reference:

function image = BPmimo2C4(Efield) %#codegen
    coder.gpu.kernelfun;
    %% creating kaiser window
    numT = 16;
    numR= 16;
    f = 10e9:0.5e9:20e9;
    numF = numel(f);
    w = ones(numel(f),1);
    viq = repmat(w.', [1,numT*numR]);
    c = physconst('LightSpeed');
    %% grid points
    xf = (-8:0.3:8)*0.01;
    yf = (-8:0.3:8)*0.01;
    [uf , vf] = meshgrid(xf,yf);
    x1f = uf(:);
    y1f = vf(:);
    %% initialization
    ArrRadius = 30;
    TX = [ArrRadius.*cosd((360/15)*(0:14))*0.01 0];
    TY = [ArrRadius.*sind((360/15)*(0:14))*0.01 0];
    K = 2*pi*f/c;
    z = 0.36:0.003:0.41;
    Efield2 = reshape(permute(Efield(1:numT*numR,:,:),[3 1 2]),[numel(f)*numT*numR,6]); % 5376x6
    Efield2_6 = single(Efield2(:,6).');
    % z = 0.4;
    XYPos1 = single([TX.', TY.']);
    UVPos = single([x1f(:), y1f(:)]);
    dtXYUV1 = pdist2(XYPos1, UVPos);
    XYPos2 = single([real(Efield(1:numR,2,1)) , real(Efield(1:numR,3,1))]);
    dtXYUV2 = pdist2(XYPos2, UVPos);
    EXP = coder.nullcopy(single((ones(21,16,16,17,2916) * 1i)));
    for f_idx = 1:numel(x1f)
        for dep = 1:17
            for r_idx = 1:numR
                for t_idx = 1:numel(TX)
                    for k_idx = 1:numel(K)
                        z2 = z(dep) * z(dep);
                        dt1 = dtXYUV1(r_idx,f_idx) * dtXYUV1(r_idx,f_idx) + z2;
                        dt1 = sqrt(dt1);
                        dt2 = dtXYUV2(t_idx,f_idx) * dtXYUV2(t_idx,f_idx) + z2;
                        dt2 = sqrt(dt2);
                        expV = exp((dt1 + dt2) * K(k_idx) * 1i);
                        EXP(k_idx, t_idx, r_idx, dep, f_idx) = expV;
                    end
                end
            end
        end
    end
    EXP_resh = reshape(EXP, [21*16*16, 17*2916]);
    image = Efield2_6 * EXP_resh;
    image = reshape(image, [17,54,54]);
end

8 个评论
显示 6更早的评论隐藏 6更早的评论

Umar 2024-7-6

编辑：Walter Roberson 2024-7-8

在 MATLAB Online 中打开

Hi Moh mor,

Sorry I couldn’t respond to your most recent comment. But I do appreciate Chao’s help. The crash and performance issues you are encountering may stem from incorrect data type definitions or mismatches between CPU and GPU data types. When transferring computations to the GPU, it is crucial to specify the data types correctly to leverage the parallel processing capabilities effectively. To address the crashing and performance issues when converting MATLAB code for GPU processing, you need to ensure that you define the data types correctly for GPU arrays. Here is an example of how you can specify the data type when working with GPU arrays in MATLAB:

% Define input data
inputData = rand(100, 'single'); % Single precision data
% Transfer data to GPU
gpuData = gpuArray(inputData);
% Perform computations on GPU
result = someGPUFunction(gpuData);
% Retrieve results back to CPU
resultCPU = gather(result);

By explicitly defining the data type (e.g., 'single' for single precision) when creating GPU arrays and performing computations, you can avoid crashes and optimize performance during GPU processing as shown in Mr. Lou’s code.

I will wait for Mr. Luo’s comments about to provide recommendations about proceeding to next stage and execution of his code performed on his system.

moh mor 2024-7-9

在 MATLAB Online 中打开

Thank you @Chao Luo ,

Here is the crash report:

MATLAB crash file:C:\Users\mohammad\AppData\Local\Temp\matlab_crash_dump.14844-1:
--------------------------------------------------------------------------------
              Assertion detected at Wed Jul 10 01:05:23 2024 +0430
--------------------------------------------------------------------------------
Configuration:
  Crash Decoding           : Disabled - No sandbox or build area path
  Crash Mode               : continue (default)
  Default Encoding         : windows-1252
  Deployed                 : false
  Graphics Driver          : Unknown hardware 
  Graphics card 1          : Intel Corporation ( 0x8086 ) Intel(R) UHD Graphics 620 Version 30.0.101.1340 (2022-2-3)
  Graphics card 2          : NVIDIA ( 0x10de ) NVIDIA GeForce GTX 1050 Version 27.21.14.6133 (2021-1-19)
  Java Version             : Java 1.8.0_152-b16 with Oracle Corporation Java HotSpot(TM) 64-Bit Server VM mixed mode
  MATLAB Architecture      : win64
  MATLAB Entitlement ID    : 6257193
  MATLAB Root              : C:\Program Files\MATLAB\R2018b
  MATLAB Version           : 9.5.0.944444 (R2018b)
  OpenGL                   : hardware
  Operating System         : Microsoft Windows 10 Enterprise
  Process ID               : 14844
  Processor ID             : x86 Family 6 Model 142 Stepping 10, GenuineIntel
  Session Key              : f4ca2c37-5411-4a01-88d1-a823b50fc923
  Window System            : Version 10.0 (Build 19044)
Fault Count: 1
Assertion at b:\matlab\src\cgir_gpu\parfortogpulowering.cpp line 613
Register State (captured):
  RAX = 0000000010a4edd8  RBX = 0000000010a4edd8
  RCX = 00000000043e9080  RDX = 0000000000000000
  RSP = 00000000043e8ff0  RBP = 00000000f4e85010
  RSI = 00000000f4e85010  RDI = 0000000000000000
 
   R8 = 0000000000000003   R9 = 00000000043e8f28
  R10 = 0000000000000014  R11 = 0000000010a3ed9e
  R12 = 00000000043e9cb0  R13 = 0000000000000000
  R14 = 0000000010a3ed08  R15 = 00000000043e96b0
 
  RIP = 00000000107f292a  EFL = 00000206
 
   CS = 0033   FS = 0053   GS = 002b
Stack Trace (captured):
[  0] 0x00000000107eb2c3                              bin\win64\libmwfl.dll+00045763 foundation::core::diag::thread_context::unspecified_bool+00000051
[  1] 0x00000000107e9288                              bin\win64\libmwfl.dll+00037512 foundation::core::diag::stacktrace_base::capture+00000024
[  2] 0x00000000107edb80                              bin\win64\libmwfl.dll+00056192 foundation::core::diag::symbols::getSymbolAddress+00009632
[  3] 0x00000000107ed468                              bin\win64\libmwfl.dll+00054376 foundation::core::diag::symbols::getSymbolAddress+00007816
[  4] 0x00000000107f228f                              bin\win64\libmwfl.dll+00074383 foundation::core::diag::terminate+00000063
[  5] 0x00000000f1a25680                             bin\win64\emlcoder.dll+00743040 mwboost::serialization::singleton_module::unlock+00438160
[  6] 0x00000000f4d49817                             bin\win64\cgir_gpu.dll+00301079 CG::lowering::ParForToGpuLowering::operator=+00043735
[  7] 0x00000000f4d4f24c                             bin\win64\cgir_gpu.dll+00324172 CG::lowering::ParForToGpuLowering::processScope+00000220
[  8] 0x00000000f473fa12                         bin\win64\cgir_support.dll+03013138 CG::ScopeTransform::apply+00000098
[  9] 0x00000000f1aa4362                             bin\win64\emlcoder.dll+01262434 mwboost::serialization::singleton_module::unlock+00957554
[ 10] 0x00000000f1aa39f6                             bin\win64\emlcoder.dll+01260022 mwboost::serialization::singleton_module::unlock+00955142
[ 11] 0x00000000f1aa70ae                             bin\win64\emlcoder.dll+01274030 mwboost::serialization::singleton_module::unlock+00969150
[ 12] 0x00000000f1aa4d80                             bin\win64\emlcoder.dll+01265024 mwboost::serialization::singleton_module::unlock+00960144
[ 13] 0x00000000f1aa2bd2                             bin\win64\emlcoder.dll+01256402 mwboost::serialization::singleton_module::unlock+00951522
[ 14] 0x00000000f1a07f27                             bin\win64\emlcoder.dll+00622375 mwboost::serialization::singleton_module::unlock+00317495
[ 15] 0x00000000f1a01353                             bin\win64\emlcoder.dll+00594771 mwboost::serialization::singleton_module::unlock+00289891
[ 16] 0x00000000f1a00b34                             bin\win64\emlcoder.dll+00592692 mwboost::serialization::singleton_module::unlock+00287812
[ 17] 0x00000000f1a006b4                             bin\win64\emlcoder.dll+00591540 mwboost::serialization::singleton_module::unlock+00286660
[ 18] 0x00000000f1cdc4f8                             bin\win64\emlcoder.dll+03589368 QueryMLFcnTable_emlcoder+00567272
[ 19] 0x0000000021b93c60                        bin\win64\pgo\mcos_impl.dll+00408672
[ 20] 0x0000000021b93232                        bin\win64\pgo\mcos_impl.dll+00406066
[ 21] 0x0000000021b9234b                        bin\win64\pgo\mcos_impl.dll+00402251
[ 22] 0x0000000021b90d12                        bin\win64\pgo\mcos_impl.dll+00396562
[ 23] 0x0000000021be6f30                        bin\win64\pgo\mcos_impl.dll+00749360
[ 24] 0x0000000021de3585                        bin\win64\pgo\mcos_impl.dll+02831749 mwboost::serialization::singleton_module::unlock+01534417
[ 25] 0x0000000021decf01                        bin\win64\pgo\mcos_impl.dll+02871041 mwboost::serialization::singleton_module::unlock+01573709
[ 26] 0x0000000010e13655                                 bin\win64\mcos.dll+00144981 omDirectCallMethod+00000069
[ 27] 0x000000001af275c2                 bin\win64\pgo\libmwlxeindexing.dll+00357826 MathWorks::lxe::MatrixModuleImplementation::SetHeterogeneousArray+00014702
[ 28] 0x000000001afa8936                 bin\win64\pgo\libmwlxeindexing.dll+00887094 MathWorks::lxe::assign_paren_shared_xvalue_ptr_uninitialized_to_struct+00006890
[ 29] 0x000000001aedc70f                 bin\win64\pgo\libmwlxeindexing.dll+00050959 MathWorks::lxe::at_rbrace_nargout+00006831
[ 30] 0x000000001aedc70f                 bin\win64\pgo\libmwlxeindexing.dll+00050959 MathWorks::lxe::at_rbrace_nargout+00006831
[ 31] 0x000000001aee00ee                 bin\win64\pgo\libmwlxeindexing.dll+00065774 MathWorks::lxe::concatenate+00001810
[ 32] 0x000000001aee02ef                 bin\win64\pgo\libmwlxeindexing.dll+00066287 MathWorks::lxe::at_rparen+00000159
[ 33] 0x0000000018385c46                            bin\win64\pgo\m_lxe.dll+02317382 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,std::vector<MathWorks::utl::attach_ptr<ir::IrTree>,std::allocator<MathWorks::utl::attach_ptr<ir::IrTree> > > >::load_object_data+00140922
[ 34] 0x000000001829fd3c                            bin\win64\pgo\m_lxe.dll+01375548 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00493568
[ 35] 0x00000000182a091c                            bin\win64\pgo\m_lxe.dll+01378588 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00496608
[ 36] 0x00000000182a1c92                            bin\win64\pgo\m_lxe.dll+01383570 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501590
[ 37] 0x00000000182a28f8                            bin\win64\pgo\m_lxe.dll+01386744 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00504764
[ 38] 0x00000000182a1ddf                            bin\win64\pgo\m_lxe.dll+01383903 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501923
[ 39] 0x00000000182a1ede                            bin\win64\pgo\m_lxe.dll+01384158 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00502178
[ 40] 0x00000000181a9a7d                            bin\win64\pgo\m_lxe.dll+00367229
[ 41] 0x00000000181bb265                            bin\win64\pgo\m_lxe.dll+00438885
[ 42] 0x00000000181b8e26                            bin\win64\pgo\m_lxe.dll+00429606
[ 43] 0x00000000181b8a24                            bin\win64\pgo\m_lxe.dll+00428580
[ 44] 0x00000000173ce007                     bin\win64\pgo\m_dispatcher.dll+00057351 Mfh_file::dispatch_fh_impl+00001111
[ 45] 0x00000000173cdaf2                     bin\win64\pgo\m_dispatcher.dll+00056050 Mfh_file::dispatch_fh_with_reuse+00000066
[ 46] 0x0000000019cde19a                            bin\win64\pgo\m_lxe.dll+28893594 mwboost::archive::detail::pointer_oserializer<mwboost::archive::binaryTerm_oarchive,MathWorks::lxe::MatlabIrTree>::save_object_ptr+00752766
[ 47] 0x00000000181b478f                            bin\win64\pgo\m_lxe.dll+00411535
[ 48] 0x00000000182733a5                            bin\win64\pgo\m_lxe.dll+01192869 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00310889
[ 49] 0x00000000175ebab1                    bin\win64\pgo\m_interpreter.dll+00047793 inFullFevalFcn+00000705
[ 50] 0x00000000173cb724                     bin\win64\pgo\m_dispatcher.dll+00046884 Mdispatcher::getDispatcher+00002228
[ 51] 0x00000000173cca07                     bin\win64\pgo\m_dispatcher.dll+00051719 Mfh_MATLAB_fn_impl::dispatch_fh_with_reuse+00000343
[ 52] 0x000000001827263c                            bin\win64\pgo\m_lxe.dll+01189436 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00307456
[ 53] 0x00000000181ab8e7                            bin\win64\pgo\m_lxe.dll+00375015
[ 54] 0x000000001836b5b9                            bin\win64\pgo\m_lxe.dll+02209209 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,std::vector<MathWorks::utl::attach_ptr<ir::IrTree>,std::allocator<MathWorks::utl::attach_ptr<ir::IrTree> > > >::load_object_data+00032749
[ 55] 0x00000000181a4ceb                            bin\win64\pgo\m_lxe.dll+00347371
[ 56] 0x0000000018335237                            bin\win64\pgo\m_lxe.dll+01987127 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+01105147
[ 57] 0x0000000018335159                            bin\win64\pgo\m_lxe.dll+01986905 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+01104925
[ 58] 0x00000000181aab16                            bin\win64\pgo\m_lxe.dll+00371478
[ 59] 0x00000000181a8e1c                            bin\win64\pgo\m_lxe.dll+00364060
[ 60] 0x00000000181bb265                            bin\win64\pgo\m_lxe.dll+00438885
[ 61] 0x00000000181ba88c                            bin\win64\pgo\m_lxe.dll+00436364
[ 62] 0x00000000181b9535                            bin\win64\pgo\m_lxe.dll+00431413
[ 63] 0x00000000181b8f72                            bin\win64\pgo\m_lxe.dll+00429938
[ 64] 0x00000000181b8a24                            bin\win64\pgo\m_lxe.dll+00428580
[ 65] 0x00000000173ce007                     bin\win64\pgo\m_dispatcher.dll+00057351 Mfh_file::dispatch_fh_impl+00001111
[ 66] 0x00000000173cdaf2                     bin\win64\pgo\m_dispatcher.dll+00056050 Mfh_file::dispatch_fh_with_reuse+00000066
[ 67] 0x000000001827263c                            bin\win64\pgo\m_lxe.dll+01189436 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00307456
[ 68] 0x00000000181ab8e7                            bin\win64\pgo\m_lxe.dll+00375015
[ 69] 0x000000001836b5b9                            bin\win64\pgo\m_lxe.dll+02209209 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,std::vector<MathWorks::utl::attach_ptr<ir::IrTree>,std::allocator<MathWorks::utl::attach_ptr<ir::IrTree> > > >::load_object_data+00032749
[ 70] 0x00000000181a4ceb                            bin\win64\pgo\m_lxe.dll+00347371
[ 71] 0x0000000018335237                            bin\win64\pgo\m_lxe.dll+01987127 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+01105147
[ 72] 0x0000000018335159                            bin\win64\pgo\m_lxe.dll+01986905 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+01104925
[ 73] 0x000000001829fd3c                            bin\win64\pgo\m_lxe.dll+01375548 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00493568
[ 74] 0x00000000182a091c                            bin\win64\pgo\m_lxe.dll+01378588 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00496608
[ 75] 0x00000000182a1c92                            bin\win64\pgo\m_lxe.dll+01383570 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501590
[ 76] 0x00000000182a28f8                            bin\win64\pgo\m_lxe.dll+01386744 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00504764
[ 77] 0x00000000182a1ddf                            bin\win64\pgo\m_lxe.dll+01383903 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501923
[ 78] 0x00000000182a1ede                            bin\win64\pgo\m_lxe.dll+01384158 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00502178
[ 79] 0x00000000181a9a7d                            bin\win64\pgo\m_lxe.dll+00367229
[ 80] 0x00000000181bb265                            bin\win64\pgo\m_lxe.dll+00438885
[ 81] 0x00000000181ba88c                            bin\win64\pgo\m_lxe.dll+00436364
[ 82] 0x00000000181b9535                            bin\win64\pgo\m_lxe.dll+00431413
[ 83] 0x00000000181b8f72                            bin\win64\pgo\m_lxe.dll+00429938
[ 84] 0x00000000181b8a49                            bin\win64\pgo\m_lxe.dll+00428617
[ 85] 0x00000000173ce007                     bin\win64\pgo\m_dispatcher.dll+00057351 Mfh_file::dispatch_fh_impl+00001111
[ 86] 0x00000000173cda9e                     bin\win64\pgo\m_dispatcher.dll+00055966 Mfh_file::dispatch_fh+00000062
[ 87] 0x0000000021b99d31                        bin\win64\pgo\mcos_impl.dll+00433457
[ 88] 0x0000000021b94826                        bin\win64\pgo\mcos_impl.dll+00411686
[ 89] 0x0000000021b9347f                        bin\win64\pgo\mcos_impl.dll+00406655
[ 90] 0x0000000021b9234b                        bin\win64\pgo\mcos_impl.dll+00402251
[ 91] 0x0000000021b914ea                        bin\win64\pgo\mcos_impl.dll+00398570
[ 92] 0x0000000021b98faa                        bin\win64\pgo\mcos_impl.dll+00429994
[ 93] 0x0000000021b99081                        bin\win64\pgo\mcos_impl.dll+00430209
[ 94] 0x0000000021b91a3a                        bin\win64\pgo\mcos_impl.dll+00399930
[ 95] 0x00000000173cca07                     bin\win64\pgo\m_dispatcher.dll+00051719 Mfh_MATLAB_fn_impl::dispatch_fh_with_reuse+00000343
[ 96] 0x000000001827263c                            bin\win64\pgo\m_lxe.dll+01189436 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00307456
[ 97] 0x00000000181ab8e7                            bin\win64\pgo\m_lxe.dll+00375015
[ 98] 0x000000001836b5b9                            bin\win64\pgo\m_lxe.dll+02209209 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,std::vector<MathWorks::utl::attach_ptr<ir::IrTree>,std::allocator<MathWorks::utl::attach_ptr<ir::IrTree> > > >::load_object_data+00032749
[ 99] 0x00000000181a4ceb                            bin\win64\pgo\m_lxe.dll+00347371
[100] 0x0000000018335a0f                            bin\win64\pgo\m_lxe.dll+01989135 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+01107155
[101] 0x000000001833597d                            bin\win64\pgo\m_lxe.dll+01988989 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+01107009
[102] 0x000000001829fd3c                            bin\win64\pgo\m_lxe.dll+01375548 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00493568
[103] 0x00000000182a091c                            bin\win64\pgo\m_lxe.dll+01378588 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00496608
[104] 0x00000000182a1c92                            bin\win64\pgo\m_lxe.dll+01383570 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501590
[105] 0x00000000182a28f8                            bin\win64\pgo\m_lxe.dll+01386744 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00504764
[106] 0x00000000182a1ddf                            bin\win64\pgo\m_lxe.dll+01383903 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501923
[107] 0x00000000182a1ede                            bin\win64\pgo\m_lxe.dll+01384158 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00502178
[108] 0x00000000181a9a7d                            bin\win64\pgo\m_lxe.dll+00367229
[109] 0x00000000181bb265                            bin\win64\pgo\m_lxe.dll+00438885
[110] 0x00000000181ba88c                            bin\win64\pgo\m_lxe.dll+00436364
[111] 0x00000000181b42a1                            bin\win64\pgo\m_lxe.dll+00410273
[112] 0x00000000181b39c6                            bin\win64\pgo\m_lxe.dll+00408006
[113] 0x00000000181b3ace                            bin\win64\pgo\m_lxe.dll+00408270
[114] 0x000000001827332b                            bin\win64\pgo\m_lxe.dll+01192747 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00310767
[115] 0x0000000018273bb4                            bin\win64\pgo\m_lxe.dll+01194932 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00312952
[116] 0x00000000175f0f22                    bin\win64\pgo\m_interpreter.dll+00069410 inFullEvalFcn+00000162
[117] 0x00000000fb5f475f                          bin\win64\libmwbridge.dll+00018271 inIsInEvalc+00000687
[118] 0x00000000173cb724                     bin\win64\pgo\m_dispatcher.dll+00046884 Mdispatcher::getDispatcher+00002228
[119] 0x00000000173ccbe7                     bin\win64\pgo\m_dispatcher.dll+00052199 Mfh_MATLAB_fn_impl::dispatch_fh+00000343
[120] 0x00000000181b68f2                            bin\win64\pgo\m_lxe.dll+00420082
[121] 0x000000001829fd3c                            bin\win64\pgo\m_lxe.dll+01375548 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00493568
[122] 0x00000000182a091c                            bin\win64\pgo\m_lxe.dll+01378588 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00496608
[123] 0x00000000182a1c92                            bin\win64\pgo\m_lxe.dll+01383570 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501590
[124] 0x00000000182a28f8                            bin\win64\pgo\m_lxe.dll+01386744 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00504764
[125] 0x00000000182a1ddf                            bin\win64\pgo\m_lxe.dll+01383903 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00501923
[126] 0x00000000182a1ede                            bin\win64\pgo\m_lxe.dll+01384158 mwboost::archive::detail::iserializer<mwboost::archive::binaryTerm_iarchive,MathWorks::lxe::function_descriptor>::load_object_data+00502178
[127] 0x00000000181a9a7d                            bin\win64\pgo\m_lxe.dll+00367229

moh mor 2024-7-9

Thank you again @Chao Luo ,

I ran my code on MATLAB successfully. Moreover, I extracted MEX file of my code using MATLAB coder and it works without any error.

Chao Luo 2024-7-10

R2018b is pretty old that I cannot debug it and give you a workaround. Is it possible for you to upgrade MATLAB at least to R2019b version?

请先登录，再进行评论。

Answer 3

Umar 2024-7-6

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/2133936-how-to-speed-up-our-code-to-be-implemented-on-gpu#answer_1482091

Hi Moh Mor,

Have you considered reaching out to MathWorks support for further assistance. Provide them with detailed information about your system configuration, MATLAB version, and the steps leading to the internal error.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How to speed up our code to be implemented on GPU

2 个评论
显示无隐藏无

采纳的回答

2 个评论
显示无隐藏无

更多回答（2 个）

8 个评论
显示 6更早的评论隐藏 6更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

How to speed up our code to be implemented on GPU

2 个评论 显示 无隐藏 无

采纳的回答

2 个评论 显示 无隐藏 无

更多回答（2 个）

8 个评论 显示 6更早的评论隐藏 6更早的评论

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

2 个评论
显示无隐藏无

2 个评论
显示无隐藏无

8 个评论
显示 6更早的评论隐藏 6更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论