mex code to read a large tab delimited file
1 次查看(过去 30 天)
显示 更早的评论
I am writing a mex file to read a tab delimited file with 16 columns each with about five lac rows. The file is tab-delimited.The first column is a date string. The second contains integers. The third contains a character which may be empty. The rest of the columns are either integers or characters. I have a sample image attached. Here is the code I wrote. It does not seem to work. Also I want to ignore the first 3 header lines. How can I do that?
#include "mex.h"
#include<stdio.h>
void mexFunction(int nlhs,mxArray *plhs[],int nrhs,const mxArray *prhs[])
{
double *Load1,*Tamb1,*TOT1,*WindA1,*WindB1,*WindC1,*Tamb2;
int M,i,loop;
mxChar *filename,*QCLoad,*QCTamb,*QCTOT,*QCWindA1,*QCWindB1,*QCWindC1,*QCTamb2;
filename = mxGetChars (prhs[0]);
plhs[0]=mxCreateDoubleMatrix(M,1,mxREAL);
Date1=mxGetChars(plhs[0]);
plhs[1]=mxCreateDoubleMatrix(M,1,mxREAL);
Load1=mxGetPr(plhs[1]);
plhs[2]=mxCreateDoubleMatrix(M,1,mxREAL);
QCLoad=mxGetChars(plhs[2]);
plhs[3]=mxCreateDoubleMatrix(M,1,mxREAL);
Tamb1=mxGetPr(plhs[3]);
plhs[4]=mxCreateDoubleMatrix(M,1,mxREAL);
QCTamb=mxGetChars(plhs[4]);
plhs[5]=mxCreateDoubleMatrix(M,1,mxREAL);
TOT1=mxGetPr(plhs[5]);
plhs[6]=mxCreateDoubleMatrix(M,1,mxREAL);
QCTOT=mxGetChars(plhs[6]);
plhs[7]=mxCreateDoubleMatrix(M,1,mxREAL);
WindA1=mxGetPr(plhs[7]);
plhs[8]=mxCreateDoubleMatrix(M,1,mxREAL);
QCWindA1=mxGetChars(plhs[8]);
plhs[9]=mxCreateDoubleMatrix(M,1,mxREAL);
WindB1=mxGetPr(plhs[9]);
plhs[10]=mxCreateDoubleMatrix(M,1,mxREAL);
QCWindB1=mxGetChars(plhs[10]);
plhs[11]=mxCreateDoubleMatrix(M,1,mxREAL);
WindC1=mxGetPr(plhs[11]);
plhs[12]=mxCreateDoubleMatrix(M,1,mxREAL);
QCWindC1=mxGetChars(plhs[12]);
plhs[13]=mxCreateDoubleMatrix(M,1,mxREAL);
Tamb2=mxGetPr(plhs[13]);
plhs[14]=mxCreateDoubleMatrix(M,1,mxREAL);
QCTamb2=mxGetChars(plhs[14]);
FILE *ptr_file;
char buf[1000000];
ptr_file =fopen(filename,"r");
fscanf(ptr_file,"%s %f %s %f %s %f %s %f %s %f %s %f %s %f %s",&Date1,&Load1,&QCLoad,&Tamb1,&QCTamb,&TOT1,&QCTOT,&WindA1,&QCWindA1,&WindB1,&QCWindA1,&WindC1,&QCWindA1,&Tamb2,&QCTamb2);
}
0 个评论
回答(1 个)
dpb
2014-1-11
doc textscan % maybe
What's the point of mex for a builtin, anyway? Or, if you want less overhead than textscan, use fscanf directly
4 个评论
dpb
2014-1-12
编辑:dpb
2014-1-12
I wrote the following earlier but don't see it -- if it shows up later duplicated, sorry...
See if can get the other party to provide the files unformatted instead or in addition to the formatted ones.
That being not possible, you can test the idea of your mex-ing skills being better than TMW's by writing a standalone Fortran or C app that just reads the files and time it. The mex overhead will only add to that minimum.
My thinking is that those formatted i/o calls will be eventually translated to the compiler runtime i/o library/system calls anyway, just as, finally, are those of the builtin Matlab routines. The only reason I could see for that root level to be significantly slower in Matlab would be overhead in cell handling and the facility to skip columns, etc. that they have incorporated. The former is why I suggested you might want to look at textread instead of textscan to eliminate the cell stuff but if you try to implement the skip at the reading level you'll have the same problems that they do for that portion.
There is, of course, one other "trick" -- read the whole file as stream binary (via fread) as character and do the translation all internally.
ADDENDUM:
Just how big is "big" for a typical file?
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Data Import and Export 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!