- Memory efficiency is important:
- Processing data in a streaming fashion:
- Parallel processing:
What is the difference between readall and read+hasdata?
2 次查看(过去 30 天)
显示 更早的评论
I found that the functions of readall and read+hasdata seem to be exactly the same. read+hasdata is a loop body, is it less efficient? So in any case you should avoid using read+hasdata? Why does matlab also provide the hasdata function?
In what scenario is it more meaningful to use read+hasdata?
ds = datastore('mapredout.mat');
while hasdata(ds)
T = read(ds);
end
ds = datastore('mapredout.mat');
readall(ds)
0 个评论
回答(1 个)
Mrutyunjaya Hiremath
2023-7-21
The functions `readall` and `read` with `hasdata` are used for reading data from datastores. These functions are not exactly the same, and they serve different purposes.
Using `read` with `hasdata` can be more meaningful and efficient in scenarios where:
`readall` is suitable for smaller datasets that can fit into memory, while `read` with `hasdata` is more appropriate for larger datasets or scenarios where memory efficiency and streaming processing are important.
10 个评论
Walter Roberson
2023-7-23
Your file name mapredout.mat hints that the .mat file might be the output of a mapreduce() call . If so then it is a Key-Value Datastore https://www.mathworks.com/help/matlab/ref/matlab.io.datastore.keyvaluedatastore.html . Key-Value datastores default to
ReadSize — Maximum number of key-value pairs to read
1 (default) | positive integer
Maximum number of key-value pairs to read in a call to the read or preview
functions, specified as a positive integer.
So any one read() call on the datastore is not going to read all of the data.
The particular datastore you are using might have been configured for a larger ReadSize, but the ReadSize cannot be set to be infinite -- in general when you read() from a datastore, even one configured with only a single .mat file, the read() might not read in all of the data if the datastore is large enough . Whereas readall() will always read all of the data, provided that it does not run out of memory.
For testing purposes, I suggest you experiment with
while hasdata(str)
T = read(str)
end
T
and see whether the read() is being called more than once, and if so whether the T at the end has all of the data that was read in. Depending on the kind of datastore and how big it is, sometimes a single read() is enough to read in all of the data; other datastores might need to read the data in chunks when you read(), and other datastores might only read one file at a time if the datastore has multiple files.
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Large Files and Big Data 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!