How to read just a part of a binary file with a predefined end position or a predefined amount of Bytes?

21 次查看(过去 30 天)
Hi. I have searched a lot to find the answer, but was not successful.
I want to get data records ({'uint16' 'uint16' 'uint16' 'uint8' 'uint8'} = 8 Bytes) out of a binary file.
The files have millions of records with 1 min time steps and a given start date.
Up to now, I was able to define the start position by skipping the wanted time duration (1 record of 8 Bytes = 1 min) with fseek.
My problem is, that I can not find a solution how to define the end position or the amount of records for fread.
One solution would be to use a Loop in which the record length is added to fseek each run and the rest of the file is skipped after every record. But this is grossly inefficient and likely would need even more time than reading the whole file and picking the wanted part out of the resulting matrix, I guess.
I hope you understand what I want to ask...
I need something like fread(fileID,start_position,end_position or number of records).
Thanks in advance.
  3 个评论
Sebastian
Sebastian 2018-12-12
编辑:Sebastian 2018-12-12
Hi Image Analyst,
I don't have evidence, but since it takes more time to get every cell from a matrix by using a loop-function than directly accessing the matrix, I concluded that the loop-attempt would increase the needed time also for this purpose.
I just read the memmapfile instruction. As far as I understood also with this function only comes the possibility to define an offset to skip the first n Bytes, but not the possibility to define a number of wanted records or an end position. I just can't understand why they did not add such an input argument when they implemented the offset argument...
The thing is, I know that it does not take ages to read a binary file. In my case it takes 15 to 20 seconds... But I just started a new project and will have to use this function a lot of times for the next 3 years. So saving a few seconds each time will add up to a not insignificant amount of time.
Image Analyst
Image Analyst 2018-12-12
I deal with 3-D CT images of up to 20 GB in size and I use fseek() and fread() to read slices out of the middle of the file and it's pretty quick, like a second or two. I'm not aware of any other ways, so you might call the Mathworks and ask them. How big are your files?

请先登录,再进行评论。

采纳的回答

Guillaume
Guillaume 2018-12-12
I'm not entirely sure I completely understand, maybe that's what you want:
recordstart = ??? %some integer value. Index of first desired record
numrecords = ??? %how many records to get
filepath = ??? %path of the file
recordtypes = {'uint16', 'uint16', 'uint16', 'uint8', 'uint8'};
recordsizes = [2, 2, 2, 1, 1]; %size of each type in bytes. Must match recordtypes
fid = fopen(filepath, 'r')
fseek(fid, (recorstart - 1) * sum(recordsizes), 'bof');
data = fread(fid, [sum(recordsizes), numrecords], '*uint8'); %read numrecords as uint8
data = mat2cell(data, recordsizes, numrecords);
data = cellfun(@(bytes, data) typecast(bytes(:), data), data', recordtypes, 'UniformOutput', false);
  1 个评论
Sebastian
Sebastian 2018-12-13
编辑:Sebastian 2018-12-13
Thanks a lot!
That's what I wanted. In the beginning of writing my function I came across the input argument 'sizeA'. But I searched on and forgot about it. I think I falsely assumed that this command would still read the whole binary file and then just rearange the output...

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Low-Level File I/O 的更多信息

产品


版本

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by