VertCat unique rows of multiple tables.

32 次查看(过去 30 天)
Hi All,
I've got myself in a bit of a muddle and am looking for some help.
I have a number of tables containing the same variables but different rows of data. Some of the data is duplicated between tables.
The first column contains filenames stored in a categorical array.
I would like to vertically concatenate the tables after filtering out duplicate rows of data defined by duplicate filenames.
Is there a way to check for duplicates within two separate category arrays, then use that as a logical mask to select the rows to append.
%% Example Table structure.
tAll = table( 'Size' ,[0,5],...
'VariableNames' ,{'FileName' ,'SampleNumber' ,'DataType' ,'Value' ,'cellData'},...
'VariableTypes' ,{'categorical' ,'uint8' ,'categorical' ,'double' ,'cell'});
So what I want to do is something like...
tAll = [ t1 ; t2(t2.FileName ~= categories(t1.FileName))] % this doesn't work because t1.FileName and t2.FileName are arrays
I understand Joining tables wouldn't be appropriate, as there is no relationship as such, they are all the same variables.
I have also tried...
tAll = union( t1 ; t2 ); % this give an error because one variable contains random numbers of cells.
% Error using tabular/union (line 42)
% Unable to group rows using unique values of the table variable 'cellData'. UNIQUE returned an error.
% Caused by:
% Error using matlab.internal.math.uniqueCellstrHelper
% Cell array input must be a cell array of character vectors.
TIA, MC.

采纳的回答

Bruno Luong
Bruno Luong 2022-4-25
% dummy test data
Filename1=["a"; "b"; "c"];
Data1=["a1"; "b1"; "c1"];
T1=table(Filename1,Data1,'VariableNames',{'Filename','Data'})
T1 = 3×2 table
Filename Data ________ ____ "a" "a1" "b" "b1" "c" "c1"
Filename2=["a"; "d"];
Data2=["a2"; "d2"];
T2=table(Filename2,Data2,'VariableNames',{'Filename','Data'})
T2 = 2×2 table
Filename Data ________ ____ "a" "a2" "d" "d2"
[~,i]=setdiff(T2.Filename, T1.Filename);
[T1; T2(i,:)]
ans = 4×2 table
Filename Data ________ ____ "a" "a1" "b" "b1" "c" "c1" "d" "d2"
  2 个评论
MC
MC 2022-4-25
编辑:MC 2022-4-25
Thanks, this put me on the right track!
I had to add some steps because setdiff() only returns the first unique value and my data contains many rows with the same filename.
% find new unique filenames to copy.
t2_uniqueFileNames = setdiff( categories(t2.Filename)), categories(t1.Filename));
% copy all rows with each unique filename.
tAll = t1;
for i = 1:length(t2_uniqueFileNames)
tAll = [tAll; t2(t2.Filename == t2_uniqueFileNames(i),:)]
end
I imagine there is a non-loop version but this is good for now.
Thanks again, MC.
Bruno Luong
Bruno Luong 2022-4-25
You migh try
tAll = [T1; T2(~ismember(T2.Filename, T1.Filename),:)];

请先登录,再进行评论。

更多回答(1 个)

Matt J
Matt J 2022-4-25
Why not as follows?
tAll = unique( [t1 ; t2] )
  1 个评论
MC
MC 2022-4-25
Thanks.
I tried this on my actual data but it gives the same error as the union() example.
Error using tabular/unique (line 39)
Unable to group rows using unique values of the table variable 'cellData'. UNIQUE returned an error.
Caused by:
Error using matlab.internal.math.uniqueCellstrHelper
Cell array input must be a cell array of character vectors.

请先登录,再进行评论。

产品


版本

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by