In memory calculations with tall arrays from different databases

3 次查看(过去 30 天)
Imagine I have two data bases as (Table and double numbers in them)
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
Also imagine that I created my tall arrays as
X = tall(ds_1);
Y = tall(ds_2);
Now, let's imagine that I trained a model, mdl, with fitlm and I want to use this model to predict from X and Y as
Anwer = predict(mdl, [X,Y]);
The error I receive is this
Error using tall/horzcat (line 23)
Incompatible tall array arguments. The tall arrays must be based on the
same datastore.
How can I solve this problem without gathering the data and just use in memory capabilities?

回答(1 个)

Guillaume
Guillaume 2019-7-16
编辑:Guillaume 2019-7-16
If you have R2019a, you can combine your two datastores.
ds_1 = tabularTextDatastore('file_1.txt');
ds_2 = tabularTextDatastore('file_2.txt');
ds_combined = combine(ds_1, ds_2);
Answer = predict(mdl, tall(ds_combined));
In previous versions, I'm not sure that there's a way to do it other than creating your own custom datastore that would keep track of both datastores (essentially recreating the R2019a CombinedDatastore).
  6 个评论
TOSA2016
TOSA2016 2019-7-17
I also notices that there might be an easier way. What if I make the datastore as
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
I still cannot make a tall array from X1 and X2 as
X_NEW = tall([X1, X2]);
As it gives me the following error.
Error using tall/horzcat (line 21)
Duplicate table variable name: 'Var1'.
Guillaume
Guillaume 2019-7-18
"The tall array generation from combined datasores is not compatible with parallel compution"
I would recommend raising a service request with matlab then, as they should make it possible to create a combined datastore that has the exact same properties as the source datastores (if they are compatible). I don't have the parallel toolbox, so I'm not sure what these properties are. Since you now have access to the source code of CombinedDatastore (in fullfile(matlaroot, 'toolbox\matlab\datastoreio\+matlab\+io\+datastore')), you could also copy it and make the required modifications.
I'm not sure you will be able to concatenate two tall arrays from the same datastore since by necessity they will have the same variable names, so indeed horizontal concatenation will create duplicate variable names which is not allowed. The only way this could work is if you are allowed to modify the variable names of the tall array. See if this work:
DS = tabularTextDatastore({'file_1.txt', 'file_2.txt'});
X1 = tall(datastore(DS.Files{1}));
X2 = tall(datastore(DS.Files{2}));
X2.Properties.VariableNames = compose('X2Var%d', 1:width(X2));

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Tall Arrays 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by