Is it possible to create a sparse binary (.bin) file on disk?

Question

Anthony Barone 2017-3-10

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/329241-is-it-possible-to-create-a-sparse-binary-bin-file-on-disk

编辑： Anthony Barone 2018-5-25

I have a project where I would like to save my results to a binary (.bin) file that is stored on disk. Results need to be saved as they are generated (so that memory can be cleared), but the order in which these results are added to the binary file is not necessarily sequential (e.g., first I write to bytes 1-100, then 1001-1100, then 301-400, etc.).

In order to write non-sequentially to a binary file, I believe that file needs to be pre-allocated on the disk in some form or another. Is it possible to create a "sparse" binary file that has an area on disk set aside but which does not require writing zeros to every bit in the .bin file? I know how many bytes the file will take up when I am done saving to it, so this isnt a problem. Alternately, is there a way for me to write non-sequentially to a binary file without pre-allocating it first?

Thanks.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Anthony Barone 2018-5-25

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/329241-is-it-possible-to-create-a-sparse-binary-bin-file-on-disk#answer_321955

编辑：Anthony Barone 2018-5-25

在 MATLAB Online 中打开

In case anyone comes across this question looking for the same thing...at some point in the last year I figured out a much better way to do this. Make a system call to

fallocate (Linux/UNIX - create or extend file)
fsutil file createnew (Windows - create file)
fsutil file seteof (Windows - extend file)
mkfile -n (MacOS - create file)

I haven't figured out extending a file on MacOS, but since this is a very unusual use case for me I have it setup to either zero-write to the end of the file or to read the data, delete, allocate a larger file, and re-write the data when a file of MacOS needs to be sparse-extended.

This is effectively instant, since it is true write-less allocation. For example, as a test I just allocated a 4 GB file in 0.05 seconds.

That said, writing non-sequentially to a file like this can be very slow, so you might be better off adding in zeros and writing data to the end of the file on the fly as needed, but write less allocation is possible to implement from within MATLAB.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

Answer 2

Jan 2017-3-13

1
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/329241-is-it-possible-to-create-a-sparse-binary-bin-file-on-disk#answer_258544

在 MATLAB Online 中打开

You can use this to expand (or shrink) a file efficiently: FEX: FileResize. It is twice as fast as appending zeros with fwrite.

function InsertData(File, Data, Format, Pos)
fid = fopen(File, 'r+');
if fid == -1
  error('*** %s: Cannot open file: %s', mfilename, File);
end
fseek(fid, 0, 1);  % Spool to end
Len = ftell(fid);
if Pos > Len
  FileResize(File, Pos);
end
fwrite(fid, Data, Format);
fclose(fid);
end

If multiple worker write to the same file... Hm. I'm not sure what happens, when two works access the same file and one writes into the section which is expanded by the other currently.

What about inventing your own "sparse" file format?

function InsertData(File, Data, Format, Pos)
fid = fopen(File, 'a');
if fid == -1
  error('*** %s: Cannot open file: %s', mfilename, File);
end
Header = [ndims(data), size(data)];
fwrite(fid, Header, 'uint64');
fwrite(fid, Data, Format);
fclose(fid);
end

A method for reading or creating full files in a post-processing will be equivalently easy. The file is read or spooled in blocks afterwards, but this will not be dramatically slower.

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Anthony Barone 2017-3-13

Thanks for suggesting FileResize. I will have to experiment if it works correctly with multiple workers.

As far as making my own type of sparse file - there is a specific format of for the file I am writing to that will allow it to be used in other applications (in paritcular .segy, which stores binary data along with a pre-defined list of header information). Making my own format would just require me to re-format it into the desiredformat when the code finishes, and as such wouldnt save me any time or trouble.

That said, even if I didnt have a target format I'm not sure this would be a good idea. The data is being written in such a way that sequential blocks of information are likely to be loaded with each other when you are loading part of the data (they represent data from locations that are physically close to each other). Introducing this type of sparse format would help initially, but seems like it would create significantly more work for accessing data once a significant amount of data has been added to the file, since it would have to jump around the file instead of reading sequentially.

请先登录，再进行评论。

Answer 3

Walter Roberson 2017-3-10

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/329241-is-it-possible-to-create-a-sparse-binary-bin-file-on-disk#answer_258236

Unfortunately, No.

The POSIX standard operation that allows for sparse files is to fseek() to a location past end of file and write data there; the file system is then permitted to leave "holes" in the parts where nothing has been written.

Unfortunately, in MATLAB, if you fseek() beyond the end of file, the location "sticks" at the end of file.

Therefore, in MATLAB, if you want to write to a scattered location, the general write procedure is:

fopen() without the 't' (text) attribute (important!), with 'a' access (not 'w' or 'w+' or 'a+' for this purpose)
fseek() to end of file
ftell() to determine the position of the end of file, in bytes
if the current end of file is before the place you need to be, fwrite() 0's to the place you need to be; otherwise fseek() to the place you need to be
fwrite() the data you want

The general read procedure is:

fopen() without the 't' (text) attribute (important!), with 'r' or 'a' or 'a+' access (not 'w' or 'w+') -- it is fine to keep the file open with 'a' access for reading and writing
fseek() to the position you need to be
ftell() to determine the position you ended up in, in bytes
if the current position is before the place you need to be, the data has not been written yet, so act appropriately
otherwise fread() the data, keeping in mind that you might encounter end of file if you were not consistent about the blocksize -- or even if the end of file happened to be exactly at the place you want to start reading

You can modify this procedure to test that the entire block of data is available before you read it.

3 个评论
显示 1更早的评论隐藏 1更早的评论

Walter Roberson 2017-3-13

For that kind of situation, perhaps memmapfile() would be suitable.

Anthony Barone 2017-3-13

To clarify, when I referred to "accessing the files" the only access that is required is a single access to write the data. After the data is written I wont need to access the written data again until after the code has finished running and all results from the code have been written to disk.

This makes me think that using memmapfile would just result in unnecessairy additions to the vitrual memory addresses, and wouldnt actually give any benefit since I dont need to access the data again after it is written. Am I correct in thinking this, or do I misunderstand something?

请先登录，再进行评论。

Is it possible to create a sparse binary (.bin) file on disk?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

更多回答（2 个）

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

Is it possible to create a sparse binary (.bin) file on disk?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

更多回答（2 个）

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

3 个评论 显示 1更早的评论隐藏 1更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

3 个评论
显示 1更早的评论隐藏 1更早的评论