Is matfile read speed affected by how file is constructed?

Question

Cameron Lee 2018-11-2

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/427628-is-matfile-read-speed-affected-by-how-file-is-constructed

编辑： Cameron Lee 2018-12-5

I have a dataset that is 259000x94000x6 of int16 data. Obviously, this is way too big to fit into memory (about 276 GB) or load at once. The main issue is that the data can only be downloaded in 94000 separate chunks that are 259000x6 each, but I need to analyze the data in 259000 separate chunks of 94000x6 arrays.

For the past two weeks I have been trying various big data techniques in Matlab to optimize the way to read all of this data. The fastest way seems to be to turn it into one large file with all the data, which MUST be built by appending 94000 files of 259000x6 arrays (and not the other way around, due to the native structure of the data). However, one very peculiar thing that I have found is that no matter how I build my giant .mat file (e.g. 259000x94000x6 or 94000x259000x6) the read speed using matfile is ALWAYS an order of magnitude quicker when reading it in 259000x6 chunks rather than 94000x6 chunks. I've tried using '-v7.3' with and without compression, I've tried chunking it into smaller files of 3GB each and for-looping through these files, I've tried turning it into a fileDataStore, and nothing seems to allow me to read the data in 94000x6 chunks as fast as I can in 259000x6 chunks! Has anyone else experienced this, know why this is, and/or know a workaround?

Thanks!

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Rik 2018-11-2

Is it possible to either share some of the data or to write some code that generates representative data?

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Cameron Lee 2018-12-5

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/427628-is-matfile-read-speed-affected-by-how-file-is-constructed#answer_350757

编辑：Cameron Lee 2018-12-5

在 MATLAB Online 中打开

I thought I'd follow this up... the short answer to the question is that read speed must be impacted by the way the file is constructed. However, I found a way around this... First, I had to build data files in chunks (I did about 80 chunks/files) that were 1175x259000x6 each. After all of these were finished, I then used the matfile command in a for-loop to bring the data in and permute the dimensions:

% Run permute function on the 80 chunk files (takes some time, cannot parfor)
for x=1:80
    xstr=num2str(x)
    filename=strcat('location\ChunkFolder\AlldataCH',xstr,'.mat');
    m=matfile(filename,'Writable',true);
    m.alldata=permute(m.alldata,[2 1 3]);
end

I was then able to read it in, and analyze it in a more timely fashion...

%% Build m (cell array of matfile connections to use repeatedly below)
xnum=0;
for x=1:80
    xnum=xnum+1;
    xstr=num2str(x);
    filename=strcat('location\ChunkFolder\AlldataCH',xstr,'.mat');
    m{xnum}=matfile(filename);
end
% Read data into MatLab in 94000x6 form & in optimized time, and analyze
parfor y=1:259920
    newdata={1};
    xxnum=0;
    for x=1:80
        xxnum=xxnum+1;
        newdata{xxnum}=squeeze(m{x}.alldata(y,:,:));
    end
    finaldata=vertcat(newdata{:})';
    
    %%%% DO ALL ANALYSIS HERE %%%%
    
end

For whatever reason, this is the only way I could find that allowed me to read the data into Matlab the way that I needed to, and in a timely manner (about a 30x improvement vs. reading it without permuting the dimensions).

As a side note, I tried to do the permute BEFORE I saved the original chunks... and that still did not work (and as I mentioned in my original post I tried just saving it as a 259000x94000x6 (and in 259000x1175x6 chunks) and that did not work). Only after I made the chunks, closed the file, brought the file back into Matlab and permuted it, did it then work. Anyway, I hope this helps anyone out there with a similar problem. Also, if anyone can find an even speedier way to do this, please just let me know.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Is matfile read speed affected by how file is constructed?

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Is matfile read speed affected by how file is constructed?

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论