Ncread performance (slowing down) for reading multiple large files and corruption

12 次查看(过去 30 天)
I am running Matlab 2017a, typically working with ~250 large files (20 GB) NetCDF files which are located on a remote HPC server. Transferring them to my local machine is not an option.
These files usually have variables with (lat,lon,time) dimensions but some may also have the depth as a fourth dimension which I am currently not worried about.
I am able to execute MATLAB scripts on the remote server and I need to read multiple variables from these files and do spatial and temporal interpolations.
The variable size for this particular simulation is: (300 X 400 X 5900) last dimension depends on the timestep obviously.
As the title suggests, I am having performance issues while using the ncread (also with netcdf.getvar, tested with that just to make sure).
I am using the ncread options (start,size,stride (although I abandoned stride which decreased the performance)) to save on memory, for instance I am currently only reading 80 X 70 X 3500 part of the data in the following code (it has netcdf.getVar because I was testing it today):
Ens.u3rho_ens=zeros(np,coord.nt); %nt: number of time steps, ne: number of files
Ens.v3rho_ens=zeros(np,coord.nt);
Ens.u2rho_ens=zeros(np,coord.ne);
Ens.v2rho_ens=zeros(np,coord.ne);
Ens.hH_ens=zeros(np,coord.ne);
Ens.h_ens=zeros([size(coord.lon_rho) coord.ne]);
for ii=1:coord.ne
tmp.v3=zeros([SizeTrm coord.nt]);
tmp.u3=zeros([SizeTrm coord.nt]);
display(['reading ',num2str(ii),' of ',num2str(coord.ne)]);
Ens.h_ens(:,:,ii)=ncread(fn{ii},'h',StrtTrm,SizeTrm);
% tic
% tmp.v3=ncread(fn{ii},Names.var1,[StrtTrm Time.timemin],...
% [SizeTrm coord.nt]);
% toc
tic
ncid=netcdf.open(fn{ii},'NC_NOWRITE');
varid=netcdf.inqVarID(ncid,Names.var1);
tmp.v3=netcdf.getVar(ncid,varid,[StrtTrm Time.timemin],...
[SizeTrm coord.nt]);
netcdf.close(ncid)
toc
tic
ncid=netcdf.open(fn{ii},'NC_NOWRITE');
varid=netcdf.inqVarID(ncid,Names.var2);
tmp.u3=netcdf.getVar(ncid,varid,[StrtTrm Time.timemin],...
[SizeTrm coord.nt]);
netcdf.close(ncid)
toc
Ens.v3rho_ens(:,[1:coord.nt])=griddata4(coord.lon_v,coord.lat_v,...
tmp.v3(:,:,[1:coord.nt]),lon_rho_p,lat_rho_p,Trm.tri_v);
Ens.u3rho_ens(:,[1:coord.nt])=griddata4(coord.lon_u,coord.lat_u,...
tmp.u3(:,:,[1:coord.nt]),lon_rho_p,lat_rho_p,Trm.tri_u);
for pp=1:np
u2=squeeze(Ens.u3rho_ens(pp,:));
v2=squeeze(Ens.v3rho_ens(pp,:));
Ens.u2rho_ens(pp,ii)=interp1(Time.timeroms,u2,timegmt(pp));
Ens.v2rho_ens(pp,ii)=interp1(Time.timeroms,v2,timegmt(pp));
end
clear tmp.v3 Ens.v3rho_ens
clear tmp.u3 Ens.u3rho_ens
% Ens.u3rho_ens=[];
% Ens.v3rho_ens=[];
clear mex
close all force
%
end
griddata4 part is a modified function and works just fine.
My issue is, ncread gets slower overtime and the performance widely fluctuates (20 seconds to 90 seconds). I looked at multiple threads regarding clearing the buffer, cache but no prevail. Interesting part is if I pause the running code and restart it again the files are then processed rapidly, (within 4-5 seconds) which suggests MATLAB or the OS is storing this information somewhere but I couldn't figure it out.
Things I've tried and didn't help:
1) Preallocating arrays
2) Minimizing the number of global variables by using structures
3) clear mex, close all force, fclose(all), clearing temporary variables
4) opening and closing files manually to make sure they are not open.
Some might suggest using NCO tools to extract the necessary variables into smaller files, that is my final resolution because a) It also takes a long time to process, b) I would like to keep everything in single files. So I will probably decrease the number of timesteps as the ultimate solution or permute the variables.
Waiting for code to run is not an issue, however MATLAB has been corrupting random netcdf files that I am trying to read using the same code I am having issues with (say after 30 or 90 or 74 loops, happens inconsistently), which happened four times so far. I am not sure why is this happening, might be related to the OS. I read those files and ran the same code, it wasn't happening before. I have the originals of the corrupted files in the work folder which work just fine. Any idea why could this be happening?

回答(1 个)

Jean Marie
Jean Marie 2020-10-22
HI,
I am having quite similar issues when using netcdf files but using the parallel toolbox. My code works perfectly when using few workers (4 on my laptop). The same code often fails on work station with 24 workers. If I reduce the number of workers on the work station, everything is fine. Of course, these tests are made with the same set of data, same parameters, etc ...
My conclusion is that the matlab netcdf driver has some unsafe elements. I suppose that some delays are not controled If many I/O are runing on the same disk from the workers, at the same time.
JMA

产品


版本

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by