Loading large binary files in Matlab, quickly
显示 更早的评论
I have some pretty massive data files (256 channels, on the order of 75-100 million samples) in int16 format. It is written in flat binary format, so the structure is something like: CH1S1,CH2S1,CH3S1 ... CH256S1,CH1S2,CH2S2,...
I need to read in each channel separately, filter and offset correct it, then save. My current bottleneck is loading each channel, which takes about 7-8 minutes... scale that up 256 times, and I'm looking at nearly 30 hours just to load the data! I am trying to intelligently use fread, to skip bytes as I read each channel; I have the following code in a loop over all 256 channels to do this:
offset = i - 1;
fseek(fid,offset*2,'bof');
dat = fread(fid,[1,nSampsTotal],'*int16',(nChan-1)*2);
Reading around, this is typically the fastest way to load parts of a large binary file, but is the file simply too large to do this any faster? Any suggestions would be much appreciated!
System details: MATLAB 2017a, Windows 7, 64bit
4 个评论
How fast is this?
tic
fread(fid, [256 Inf], '*int16')
toc
Test it on a smaller data set first. Do you have 256 x 100 Million data? so 256 x 2 byte x 100E6 = 51 GB? If so, it'll require a lot of RAM... Or if you have 100 Million data total (0.2GB), then it should be fast to load.
dpb
2018-8-20
How much RAM do you actually have? Sounds like the performance hit is probably that you're running into actually being swapped in/out of virtual memory; fread is pretty quick for straight data transfer to/from memory.
Is the processing required dependent upon having the whole timeseries in memory or can you do it piecewise on each channel?
You may just have a system limitation here...
采纳的回答
更多回答(0 个)
类别
在 帮助中心 和 File Exchange 中查找有关 Scripts 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!