sometimes datastore read() function reads number of lines different than 'readsize' parameter
5 次查看(过去 30 天)
显示 更早的评论
Hello,
I have created 2D array(144000x52) in a csv format.
when I called the the function readmatrix('M.csv') I can see the variable in the workspace with the correct size (144000x52)
the issue is that when I create a datastore variable and trying to read the rows of the matrix batch by batch, sometimes number of read rows is not equal to 'readsize' parameter.
for example ;
ds=datastore('M.csv','ReadSize',2000);
for i=1:72
i
size(read(ds))
end
What I expect from the code above is that, as the readsize is 2000 and total number of rows 144000, there are 144000/2000=72 batches to be read and returned size must be 2000x52 for all the i values.
however, when i=19 and i=39
size(read(ds)) returns 186x52 (for i=19) and 193x52(for i=39)
for other i values it returns 2000x52.
0 个评论
回答(1 个)
Steven Lord
2022-7-4
If you look at the description of the ReadSize property of the tabularTextDatastore class, the sentence describing the behavior when the property is a positive integer value is "If ReadSize is a positive integer, then each call to read reads at most ReadSize rows." [I added the emphasis.] There is no guarantee that read will read exactly that many rows.
I believe you should call hasdata on the datastore to determine if there is still data to be read from it rather than assuming a certain number of read calls will read the entire data set. This will also make your code more robust to changes in the size of your data; suppose that instead of reading data collected (as an example) 1 row per second for 40 hours:
hours(seconds(144000))
you instead collect 1 row per second for 60 hours. Or whatever processing you're measuring finishes more quickly than you expected and you only have 30 hours of data.
0 个评论
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Datastore 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!