Create tables based on partial match in variable names in other table

19 次查看(过去 30 天)
Hi Matlab-gurus,
I have a question on creating new tables based on partial matches in variable names from an original table. I'm trying with the strcmp-function and and eval-function. I know this is not optimal but it's my best shot for now. See below for my attached code. matchVals are the partial matches in the variable names that I consider when creating the new tables. The matchvals are on position 5 to 8 (see strcat).
Thanks in advance.
names5 = Table_A.Properties.VariableNames;%1
names6 = Table_B.Properties.VariableNames;%1
matchVals = {'1234', '5678','9101','1123'}; %part of the variable names that match => four new tables
numTables = numel(matchVals);
%%
tableNames_ = cell(numTables,1);
for k = 1:numel(matchVals)
idl5 = cellfun(@(x) strcmp(x(5),matchVals{k})... %assessing position 5 to 10 for correct variable names
&&strcmp(x(6),matchVals{k})...
&&strcmp(x(7),matchVals{k})...
&&strcmp(x(8),matchVals{k}),names5);
idl6 = cellfun(@(x) strcmp(x(5),matchVals{k})...%assessing position 5 to 10 for correct variable names
&&strcmp(x(6),matchVals{k})...
&&strcmp(x(7),matchVals{k})...
&&strcmp(x(8),matchVals{k}),names6);
%
eval(['Summary',matchVals{k},' = [Table_A(:,idl5) Table_B(:,idl6)]']);
tableNames_EC2217{k} = ['Summary',matchVals{k}]; %trying to create new table
end

采纳的回答

Voss
Voss 2022-5-6
It's not clear why you want to split a single table into 4 tables, but here's one way:
T = readtable('m.xlsx');
head(T)
ans = 8×8 table
absd1234xs absd5678xs aood9101xs aood1123xs adad1234xs aiid5678xs adsd9101xs addd1123xs __________ __________ __________ __________ __________ __________ __________ __________ 1 2 3 4 5 6 8 1 3 2 2 312 1 2 1 5 4 2 1 2 1 1 2 6 5.6667 2 0 104 -1.6667 -2 -2.3333 9 7.1667 2 -1 103 -3.6667 -4.5 -5.3333 11.5 8.6667 2 -2 102 -5.6667 -7 -8.3333 14 10.167 2 -3 101 -7.6667 -9.5 -11.333 16.5 11.667 2 -4 100 -9.6667 -12 -14.333 19
matchVals = {'1234', '5678','9101','1123'};
n_match = numel(matchVals);
T_new = cell(1,n_match);
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
T_new{k} = T(:,idx);
end
T_new % cell array of tables
T_new = 1×4 cell array
{10000×2 table} {10000×2 table} {10000×2 table} {10000×2 table}
head(T_new{1})
ans = 8×2 table
absd1234xs adad1234xs __________ __________ 1 5 3 1 4 1 5.6667 -1.6667 7.1667 -3.6667 8.6667 -5.6667 10.167 -7.6667 11.667 -9.6667
head(T_new{2})
ans = 8×2 table
absd5678xs aiid5678xs __________ __________ 2 6 2 2 2 1 2 -2 2 -4.5 2 -7 2 -9.5 2 -12
It would be better to index into the existing table:
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
% do what you need to do with T(:,idx)
end
  11 个评论
Voss
Voss 2022-5-15
T.Properties is a TableProperties object, essentially a scalar struct, so it doesn't make sense to try to index its columns with T.Properties(:,idx).
T = readtable('m.xlsx');
T.Properties
ans =
TableProperties with properties: Description: '' UserData: [] DimensionNames: {'Row' 'Variables'} VariableNames: {'absd1234xs' 'absd5678xs' 'aood9101xs' 'aood1123xs' 'adad1234xs' 'aiid5678xs' 'adsd9101xs' 'addd1123xs'} VariableDescriptions: {} VariableUnits: {} VariableContinuity: [] RowNames: {} CustomProperties: No custom properties are set. Use addprop and rmprop to modify CustomProperties.
You can sum all the columns of T with variable names matching each element of matchVals, by doing sum(T{:,idx},2) (here I'm storing the sums in the struct array and also in new columns of T - two different options):
matchVals = {'1234', '5678','9101','1123'};
n_match = numel(matchVals);
S = struct('name',matchVals,'data',cell(1,n_match));
for k = 1:n_match
idx = contains(T.Properties.VariableNames,matchVals{k});
% sum all columns of T matching matchVals{k},
% store in data field of S(k):
S(k).data = sum(T{:,idx},2);
% --- or ---
% sum all columns of T matching matchVals{k},
% store in a new column of T called "sum_1234", etc.:
T.(['sum_' matchVals{k}]) = sum(T{:,idx},2);
end
head(T) % now T has new columns, sum_1234, etc.
ans = 8×12 table
absd1234xs absd5678xs aood9101xs aood1123xs adad1234xs aiid5678xs adsd9101xs addd1123xs sum_1234 sum_5678 sum_9101 sum_1123 __________ __________ __________ __________ __________ __________ __________ __________ ________ ________ ________ ________ 1 2 3 4 5 6 8 1 6 8 11 5 3 2 2 312 1 2 1 5 4 4 3 317 4 2 1 2 1 1 2 6 5 3 3 8 5.6667 2 0 104 -1.6667 -2 -2.3333 9 4 0 -2.3333 113 7.1667 2 -1 103 -3.6667 -4.5 -5.3333 11.5 3.5 -2.5 -6.3333 114.5 8.6667 2 -2 102 -5.6667 -7 -8.3333 14 3 -5 -10.333 116 10.167 2 -3 101 -7.6667 -9.5 -11.333 16.5 2.5 -7.5 -14.333 117.5 11.667 2 -4 100 -9.6667 -12 -14.333 19 2 -10 -18.333 119
S(1).data % now each S(k).data is a single column
ans = 10000×1
6.0000 4.0000 5.0000 4.0000 3.5000 3.0000 2.5000 2.0000 1.5000 1.0000
Vlatko Milic
Vlatko Milic 2022-5-15
thank you. This is preferrable compared to working with tons of tables. I even managed to include the matching variables as headings in the corresponding plots. Not the easiest to do include legends for the columns in each plot haha, but I will manage

请先登录,再进行评论。

更多回答(1 个)

Sean de Wolski
Sean de Wolski 2022-5-6
Convert your cellstrs to strings
string(t.Properties.VariableNames)
Then you can use any of the easy string matching functions like matches startsWith or any of the patterns.
  2 个评论
Vlatko Milic
Vlatko Milic 2022-5-6
编辑:Vlatko Milic 2022-5-6
I changed the cellstr to strings but cannot manage to identify the patters. Do you maybe know how i could do this within my loop attached above?
My four table outputs are of type: 10 000×0 empty table (i.e. the rows are empty which they should not be)
Vlatko Milic
Vlatko Milic 2022-5-6
I attached the file I'm working with. I want to create four new columns based on the coloring of the variables (and the matches are based on the strings in bold)

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Tables 的更多信息

产品


版本

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by