Error was detected while a MEX-file was running and MATLAB is exiting because of fatal error

I am trying to run the batched version of QR (dgeqrfbatched) in matlab using CUBLAS by calling it from a mex file. I am struck with this error which i am not able to find the answer , Is there any work around for this problem? I am attaching the code that i am running and also the crash report
#include "mex.h"
#include "cublas_v2.h"
// The MEX gateway function.
void mexFunction(int nlhs, mxArray *plhs[], int nrhs,const mxArray *prhs[])
// Get input variables from Matlab (host variables).
double **A;
// Get dimensions of input variables from Matlab.
size_t m, n, k;
const mwSize *Adims;
Adims = mxGetDimensions(prhs[0]);
//Bdims = mxGetDimensions(prhs[1]);
m = Adims[0];
n = Adims[1];
k = Adims[2];
A = (double**)mxGetPr(prhs[0]);
int lda = m;
const int batchSize=k;
//step -1 Allocate storage for batch count
double **tau;
tau = (double**)malloc(batchSize * sizeof(double*));
for (int i = 0; i < batchSize; i++)
tau[i] = (double*)malloc(n * sizeof(double));
int *info;
info = (int*)malloc(batchSize * sizeof(int));
//step -2 create host pointer array to the gpu array
double **d_A, **d_TAU, **h_d_A, **h_d_TAU;
h_d_A = (double**)malloc(batchSize * sizeof(double*));
h_d_TAU = (double**)malloc(batchSize * sizeof(double*));
for (int i = 0; i < batchSize; i++) {
cudaMalloc((double**)&h_d_A[i], m*n * sizeof(double));
cudaMalloc((double**)&h_d_TAU[i], n * sizeof(double));
//step -3 copy host array of pointers to device
cudaMalloc((double**)&d_A, batchSize * sizeof(double*));
cudaMalloc((double**)&d_TAU, batchSize * sizeof(double));
cudaMemcpy(d_A, h_d_A, batchSize * sizeof(double*), cudaMemcpyHostToDevice);
cudaMemcpy(d_TAU, h_d_TAU, batchSize * sizeof(double*), cudaMemcpyHostToDevice);
for (int i = 0; i < batchSize; i++)
cudaMemcpy(h_d_A[i], A[i], m *n * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy(h_d_TAU[i], tau[i], n * sizeof(double), cudaMemcpyHostToDevice);
// --- CUBLAS initialization
cublasHandle_t cublas_handle;
cublasDgeqrfBatched(cublas_handle, m, n, d_A, lda, d_TAU, info, batchSize);
for (int i = 0; i < batchSize; i++)
cudaMemcpy(A[i], h_d_A[i], m*n * sizeof(double), cudaMemcpyDeviceToHost);
//print the A matrix
for (int k = 0; k < batchSize; k++) {
for (int j = 0; j < m; j++) {
for (int i = 0; i < n; i++) {
int index = j * m + i;//not tested
//count = count + 1;
printf("\n %d The values are %lf",k+index, A[k][index]);
} // i
} // j
} // k
When i execute the above program this is the crash report i am getting.
Segmentation violation detected at Thu Mar 14 08:52:21 2019 -0700
This error was detected while a MEX-file was running. If the MEX-file
is not an official MathWorks function, please examine its source code
for errors. Please consult the External Interfaces Guide for information
on debugging MEX-files.
Edric Ellis
Edric Ellis 2019-3-15
Not really an answer to your question as such - but note that if you have Parallel Computing Toolbox, you might be able to use pagefun - it doesn't support QR directly, but it does support batched mldivide...



James Tursa
James Tursa 2019-3-14
编辑:James Tursa 2019-3-14
Can you explain what you intended with these lines for A:
double **A;
A = (double**)mxGetPr(prhs[0]);
If you pass in a regular double array, there are doubles in the data area of prhs[0], not pointers to doubles. You've got one too many levels of indirection here. What were your intentions with this?
Downstream in your code you appear to use A[i] as a pointer in a memory copy. Since there are doubles behind A, and not pointers to doubles behind A, you would be using a floating point double bit pattern as a pointer and this will crash MATLAB.
James Tursa
James Tursa 2019-3-15
编辑:James Tursa 2019-3-21
Using A will point to the first batch (we typically use the term "plane" or "page" here to refer to the first 2D slice of a multi-dimensional array). To point to the next plane, simply increment the pointer by the appropriate amount. E.g.,
A points to the first plane
A+m*n points to the second plane
A+m*n*2 points to the third plane
A+m*n*3 points to the fourth plane
So, programatically you would simply use A+m*n*i as your pointer to the plane you want to process, where i is a 0-based index (like you currently have in your for-loop).


更多回答(0 个)


