read and divide HDF5 data into chunks

Question

nlm 2018-10-12

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/423693-read-and-divide-hdf5-data-into-chunks

编辑： nlm 2018-10-15

I have 1000 + HDF5 files of 1800 by 3600 matrix. I want to divide the 1800 * 3600 matrices into 4 chunks and store with a ID into an array. I want to repeat this process for 1000 + files. Can someone help how to use H5P.set_chunk OR H5S.select_hyperslab ? I used H5S.select_hyperslab to get only one slab, how should I repeat this process ?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Dinesh Iyer 2018-10-12

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/423693-read-and-divide-hdf5-data-into-chunks#answer_341124

编辑：Dinesh Iyer 2018-10-12

在 MATLAB Online 中打开

The H5P.set_chunk is used to specify the chunk dimensions of a dataset i.e. what should the size of each chunk when it is is stored in the file. The H5S.select_hyperslab is used to specify the portion of the dataset that you want to read. If you are reading data a portion of the data from a dataset, this is probably what you need to do.

When you say that you want to store each chunk with an ID into an array, do you mean you want to read it into MATLAB or do you want to store it again into another HDF5 file?

For starters, you can use the high-level h5read function to read a portion of the dataset. I am not sure how you want to divide the data into 4 chunks but I am going to assume that each chunk is 1800x900. This does not impact the code.

The code below provides an idea on how you can do this.

fileNames = dir('*.h5');
fileNames = {fileNames.name}'
numChunks = 4;
chunkSize = [1800 900];
for cnt = 1:numel(fileNames)
    fileToRead = fileNames{cnt};
    s = struct();
    for cnt = 1:numChunks
        ID = sprintf('%s_Chunk_%02d', matlab.land.makeValidname(fileToRead), cnt);
        startLoc = [1 chunkSize(2)*(cnt-1)+1];
        s.(ID) = h5read(fileToRead, '/mydataset', startLoc, chunkSize);
    end
end

I have not run the above code and so apologies for any errors but it does give an idea of how you can do this.

If you want to use the low-level functions such as H5D.read, you have to loop and update the h5_start input argument to point to the location of the dataset that you want to read.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Dinesh Iyer 2018-10-12

The code that I have provided should help you get started. It results in 4 chunks because I have taken a chunk size of [1800 900]. You can modify this.

If you want to speed up the operation, you can use PARFOR loops to parallelize the files that you are processing.

nlm 2018-10-15

编辑：nlm 2018-10-15

在 MATLAB Online 中打开

It results in only 4 chunks from 400 files that are there. How do I get 4 chunks for each of the 400 HDF5 files and store it in structure.

I modified your code, and I get empty arrays,

fileNames = dir('*.HDF5');

fileNames = {fileNames.name}';

numChunks = 4; chunkSize = [1800 900];

for cnt1 = 1:numel(fileNames)

    fileToRead = fileNames{cnt1};
    s = struct();
    for cnt = 1:numChunks
        ID = sprintf('C_%d', cnt);
        startLoc = [1 chunkSize(2)*(cnt-1)+1]
        s(cnt1).(ID) = h5read(fileToRead, '/Grid/HQprecipitation', startLoc, chunkSize);
    end
end

请先登录，再进行评论。

read and divide HDF5 data into chunks

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论
显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

read and divide HDF5 data into chunks

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

3 个评论 显示 1更早的评论隐藏 1更早的评论

更多回答（0 个）

另请参阅

类别

标签

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论