Datastore readsize - unexpected behavior

9 次查看(过去 30 天)
Anders
Anders 2023-6-23
评论: Rik 2023-6-23
I would expect the code below to read 40k lines from my datastore at each pass but for reasons unkown to me the number of lines varies between the passes.
ds = tabularTextDatastore(filename,'ReadSize',40000);
c = 0;
while hasdata(ds)
c = c + 1;
TT = read(ds);
T = height(TT);
if c==1
t_total = T;
else
t_total = t_total + T;
end
disp("Done with " +t_total +" ticks.")
end
This procedes the output :
Done with 40000 ticks.
Done with 45096 ticks.
Done with 85096 ticks.
Done with 90190 ticks.
Done with 130190 ticks.
I would expect the increment to be 40k each time. The data is timestamped and based on the timestamp the data in the csv file "filename" does not seem to be corrupt in any way. That is, there are no missing timestamps when reading the data. Is there anything I can do so that I will get 40k lines at each pass (except the last pass of course) ?.
  3 个评论
Anders
Anders 2023-6-23
Sorry, I should have been more careful with the code example. Fixed that now. The actual data I'm using is proprietary so I'm not allowed to share it. Would it be helpful with an example file with the same structure?
Rik
Rik 2023-6-23
Anything that reproduces this problem is fine. You care about the actual data, we don't. For this problem, the only thing that matters is that the data produces the same results.

请先登录,再进行评论。

回答(1 个)

Sanskar
Sanskar 2023-6-23
Hi Anders!
What I understand from your question is that you want to read 40k lines from your datastore but you are getting random lines after first iteration of the loop.
'ReadSize' property which you are using call to read at most number of rows which is given as argument.
But 'hasdata' function doesn't guarantee that exactly 'ReadSize' number of rows will be passed.
Instead of 'hasdata' you can use 'isDone()' to check if all the data has been read from dataset.
Following is the modified code:
ds = tabularTextDatastore(filename, 'ReadSize', 40000);
c = 0;
while ~isDone(ds) % Use isDone instead of hasdata
c = c + 1;
if c == 1
t_total = T;
else
t_total = t_total + T;
end
data = read(ds); % Read exactly 40,000 lines at each pass
disp("Done with " + t_total + " ticks.")
end
Following are the link of dcumentation for isDone():
  1 个评论
Anders
Anders 2023-6-23
编辑:Anders 2023-6-23
Hi Sanskar,
I get an Unrecognized function or variable 'isDone'. Is isDone part of some toolbox? When I type which isDone I get a 'not found' message.
If I understand the documentation correctly isDone is used for system objucts and cannot be used with datastores.

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Data Import and Analysis 的更多信息

产品


版本

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by