mexcuda compiler error: "__global__" does not apply here
10 次查看(过去 30 天)
显示 更早的评论
I'm writing a MEX CUDA function that performs the distance formula as: for each pixel of an image, where the pixel locations are in x and y respectively, and and contain the locations for an array of transducers. The function details aren't important for now, because I cannot compile when this kernel is in the code:
__global__ void distance_formula_index_units(uint32_t * delay, const double * x, const double * z, const double * x0, const double * z0, params * p) {
int x_px = threadIdx.x;
int y_px = threadIdx.y;
int sen = blockIdx.x;
int m_id = blockDim.x * blockDim.y + sen;
double x_dist = x[x_px] - x0[sen];
double y_dist = z[y_px] - z0[sen];
double distance = sqrt(x_dist*x_dist + y_dist*y_dist);
delay[m_id] = (uint32_t)(distance / p->c / p->dt);
}
Other explanations for context: delay is the output matrix of time delays, p is a typedef'd struct from earlier in the code that contains parameters about the image and system such as the wavespeed c and sample time resolution dt.
typedef struct params {
size_t n_xp;
size_t n_yp;
size_t n_sens;
double dt;
double c;
} params;
I run "mexcuda calculate_delays.cu", and instead of compiling, I get:
Error using mex
.\06_MEX_Functions\calculate_delays.cu(14): error: attribute "__global__" does not apply here
__declspec(__global__) void distance_formula_distance_units(uint32_t * delay, const double * xx, ...
^
I have only included this one function from the code, because it successfully compiles when I comment out this specific kernel, and the logic gate that activates it. There's another kernel in the code as well (distance_formula_time_units). The code that calls this function ALSO throws a different error:
if (*use_index_units) {
mxGPUArray * delaymatrix = mxGPUCreateGPUArray(3, output_dimensions, mxUINT32_CLASS, mxREAL, MX_GPU_DO_NOT_INITIALIZE);
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
distance_formula_index_units
<<<input_parameters->n_xp*input_parameters->n_yp, input_parameters->n_sens>>>
(delay_dvc_int, x_arr_dvc, y_arr_dvc, x0_dvc, z0_dvc, input_parameters);
cudaDeviceSynchronize();
// copy delay matrix from device back to host, set output
outputs[0] = mxGPUCreateMxArrayOnCPU(delaymatrix);
mxGPUDestroyGPUArray(delaymatrix);
} else { ...(call the other kernel)
Commenting out the destance formula index units kernel gives a compiler error on this block:
.\06_MEX_Functions\calculate_delays.cu(101): error: identifier "uint32_t" is undefined
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
^
This is wild, because uint32_t is completely defined in line 24 of stdint.h, and this is in the include tree of mex.h.
As far as I can tell, my code is completely valid, and intellisense in VSCode thinks so, too. ChatGPT doesn't find any code errors, either. What is going on here?
0 个评论
采纳的回答
Joss Knight
2024-6-24
编辑:Joss Knight
2024-6-24
You cannot call a function declared __global__ from another function declared __global__. Declare the second function as __device__ and that should work.
Your device functions and kernels need to be declared using supported CUDA device types, whereas you are using host-side types defined by the MEX headers. Try uint32_T instead (i.e. capital T).
更多回答(0 个)
另请参阅
产品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!