Merge columns in two tables depending on column names

Question

Vlatko Milic 2021-12-8

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1606290-merge-columns-in-two-tables-depending-on-column-names

评论： Vlatko Milic 2022-1-11

Hi,

I have two tables consisting of six and 21 columns. I want to create three new tables based on the column names in the aforementioned tables. This I want to perform depending on the 10th and 12th position in the first table with six columns (see Mat_question-file). The 10th and 12th position contains values of 1/2, 3/4 or 5/6. These values (i.e. 1/2, 3/4 or 5/6) in the first table, I want to merge with the columns in table no.2 based on position 7. Position 7 in the second table (21 columns) contains the values of 2,3 or 6 ( see Mat_question2-file). I want to merge all the columns in the second table with the same value, i.e. 2,3 and 6, on the 7th position. Thereafter, I want to to merge the already merged columns from the second table with the first table based on if the 10th or 12th value in each column (table 1) contains a 1 or 2, a 3 or 4, or a 5 or 6. Hence, as an example, the columns in table 2 that contain a 3 on the 7th position should be connected with the columns in table 1 that contain a 3 on the 10th or 12th position. Based on this I should get three new tables with the conditions described above.

Quite a messy problem. Happy to elaborate.

The files are attached in the question.

Thanks in advance MatLab-friends

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

Alex Normanie 2022-1-6

Following - I need to learn how to use conditions for some merging operations too! thanks for posting

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Jon 2021-12-8

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1606290-merge-columns-in-two-tables-depending-on-column-names#answer_850680

I think you can probably accomplish what you want using one or more of MATLAB's join, innerjoin, and outerjoin commands.

Please look at the documentation for those, try to code up as far as you can get and if you are still stuck post for more help.

31 个评论
显示 29更早的评论隐藏 29更早的评论

Vlatko Milic 2021-12-8

在 MATLAB Online 中打开

I was thinking of making a "for" and an "if" statement based on my conditions. However, I noticed that I cannot have the same variable names in different columns due to Matlab. I was thinking of a code similar to the one below (please see comment also in the code). Here I have problem of investigating a certain position, e.g. position 5 in the variable name, and not the entire variable name. As mentioned above, Matlab does not allow the same variable name for multiple columns.

Variable_Names = Test_table.Properties.VariableNames; 
columns = width(Test_table);
j = 1;
for i = 1:columns
    if contains(Variable_Names{i}, '2')   %I do not know how to specify that I want to extract this value at a certain position, but only for the entire variable name...
        new_table(j) = Test_table(:,i);
        new_table.Properties.VariableNames{j} = Variable_Names{i};
        j = j + 1;    
    end
end

I would like to make this procedure automatic since it will be used multiple times, hence the use of the for and if statement

Thank you

Jon 2021-12-9

编辑：Jon 2021-12-9

在 MATLAB Online 中打开

I think this will get you what you want or at least will be close. Important note, I had to edit your Mat_question2.xlsx file and put in a column name for the last column. Otherwise it didn't read the other column names correctly. I named it 'lastColumn' but you could call it anything as long as it doesn't have any numbers in the name.

% Read the data from input files into tables
T1 = readtable('Mat_question.xlsx','VariableNamingRule','preserve')
T2 = readtable('Mat_question2.xlsx','VariableNamingRule','preserve')
% Get the column names
names1 = T1.Properties.VariableNames;
names2 = T2.Properties.VariableNames;
% Loop through matching characters building new tables
matchVals = {'2','3','6'};
numTables = numel(matchVals); % the number of new tables to be created
% make new tables based upon specific characters found in column names
newTables = cell(numTables,1); % make cell array to hold new tables
for k = 1:numel(matchVals)
    % First make new table by selecting out columns from table 2
    % use cellfun to go through each element of names2 and make a logical
    % vector whose elements are true where the desired characters match
    idl2 = cellfun(@(x) strcmp(x(7),matchVals{k}),names2); 
    % Similarly, find columns in table 1 to include in new table
    idl1 = cellfun(@(x) strcmp(x(11),matchVals{k})...
        ||strcmp(x(13),matchVals{k}),names1); 
    % add the matching columns to the new table
    newTables{k} = [T1(:,idl1) T2(:,idl2)];
end

Jon 2021-12-10

编辑：Jon 2021-12-10

在 MATLAB Online 中打开

The code I provided runs without error for me using your two example files, 'Mat_question.xlsx','Mat_question2.xlsx'. So I'm assuming the error you are getting results when running on different files. If you could provide a self contained example (code and input files) that demonstrates the problem I could help you debug it further.

I'll also try to explain a little better how the code I gave works, and that might help you adapt what you are doing to your situation. Let's look at this line of code:

    idl1 = cellfun(@(x) strcmp(x(11),matchVals{k})...
        ||strcmp(x(13),matchVals{k}),names1); 

First there is the concept of a cellfun. A cellfun loops through every element of a cell array applying the supplied function to each element of the cell arrray. So in this case it applies the same function to every element of names1, the cell array of column names. The function that it applies to each element is defined by the anonymous function

    @(x) strcmp(x(11),matchVals{k})...
        ||strcmp(x(13),matchVals{k})

The argument for this function, x, is an element of names1, so a column name. The variables, x(11) and x(13) are the 11th and 13th characters in the column name. These are each compared to matchVals{k}, which for a given loop has the values '2', '3' or '6'. The function strcmp compares the character in question with matchVals{k} and returns a true if it matches. So the expression

strcmp(x(11),matchVals{k})||strcmp(x(13),matchVals{k}

will be true if either the 11th character of the column name or the 13th character of the column name matches the current value of matchVals{k}.

So let's say we are on the first loop, with k = 1, matchVals{1} will equal '2', and the expression will be true for each column name where the 11th or 13th character is a 2.

Since the cellfun applies this to each element in the names1 cellarray the result is a logical vector,idl1, of 0's and 1's (false and true) indicating for each column name whether the 11th or 13th character is a 2. So stopping in the debugger at the end of the first loop, with k = 1 we have

 K>> names1,matchVals,k, matchVals{k},idl1 
names1 =
  1×6 cell array
  Columns 1 through 4
    {'XXXXXXX M 1-2 V…'}    {'XXXXXXX M 1-2 V…'}    {'XXXXXXX M 3-4 V…'}    {'XXXXXXX M 3-4 V…'}
  Columns 5 through 6
    {'XXXXXXX M 5-6 V…'}    {'XXXXXXX M 5-6 V…'}
matchVals =
  1×3 cell array
    {'2'}    {'3'}    {'6'}
k =
     1
ans =
    '2'
idl1 =
  1×6 logical array
   1   1   0   0   0   0

The main loop, which increments k then repeats this for the next value to be tested matchVals{2}='3' etc.

I hope this helps you understand better how the code works.

Jon 2021-12-13

在 MATLAB Online 中打开

Hi Vlatko,

Glad to hear you were able to get your program working.

Actually it is really better not to name your tables for example Table2, Table3 and Table6 encoding the information about what is in the tables with the table name. You will run into the same kind of issues you did when the column names for your varaible held that kind of information.

Here is a good reference that explains in detail why this practice is discouraged https://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval

So in the above code, leaving the individual tables in the cellarray of tables newTables, you could select a particular table, corresponding to the characters '2', '3', or '6' using indexing. For example to find the table corresponding to '2' you could use

T = newTables(strcmp('2',matchVals))

If the matching criteria is always numeric, you might want to make a numeric array to hold your matching criteria, so

matchNums = [2,3,6]
% to find the table that had data corresponding to 3,
T = newTables(3 == matchNums)

That being said, if having read all of the above, you really think it is better for your purposes to dynamically name the variables, MATLAB will let you do it using the eval function. For example within the main loop instead of using

    % add the matching columns to the new table
    newTables{k} = [T1(:,idl1) T2(:,idl2)];

you could use

eval(['Table_',matchVals{k},' = [T1(:,idl1) T2(:,idl2)]' ])

which would make tables named Table_2, Table_3, Table_6

I really would not recommend this approach though.

Jon 2021-12-14

编辑：Jon 2021-12-14

在 MATLAB Online 中打开

Note, in the above example all of the tables Table_2, Table_3 and Table_6 are the same. So this isn't a very realisitic example. I just wanted to show that the dynamic assignment worked.

Here is another way to do it. Still frowned upon, but maybe a little nicer than using eval. Instead use a structure to hold the tables with dynamically named fields. So the idea is create for example a structure called Tables and give it fields t_2, t_3, t_6 etc. You could then access for example a particular table as Tables.t_2

Here's an example showing this. To make it a little more realistice here I made all of the tables different, by randomly selecting columns that are included in each table. Not really important, mostly just look at the last few of lines of code to see the dynamically assigned structure

% make up some data for the example
T1 = array2table(rand(4,4))
T2 = array2table(rand(4,5))
T1.Properties.VariableNames = {'a','b','c','d'}
T2.Properties.VariableNames = {'v','w','x','y','z'}
idl1 = logical(randi(1,[3,4]));
idl2 = logical(randi(1,[3,5]));
matchVals = {'2','3','6'};
% make structure with dynamically named fields to hold different tables
% so, Tables.t_2, Tables.t_3, etc.
Tables = struct; % preallocate structure to hold tables
for k = 1:3
    fieldName = ['t_',matchVals{k}]; % generate field name of form t_2, t_3 etc.
    Tables.(fieldName) = [T1(:,idl1(k,:)) T2(:,idl2(k,:))]
end

Vlatko Milic 2022-1-6

编辑：Vlatko Milic 2022-1-6

在 MATLAB Online 中打开

Of course. I had a number of tables titled with certain integers. I wanted to join these tables based on similarities in the table names, such as if the tables have a "2" in the table name. This was performed successfully using the code:

T1 = array2table(rand(4,4))
T2 = array2table(rand(4,5))
T1.Properties.VariableNames = {'a','b','c','d'}
T2.Properties.VariableNames = {'v','w','x','y','z'}
idl1 = logical(randi(1,[3,4]));
idl2 = logical(randi(1,[3,5]));
matchVals = {'2','3','6'};
for k = 1:3
    eval(['Table_',matchVals{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end

As an output from the code above I get three tables titled "Table 2", "Table 3" and "Table 6". Until this point everything is working superb with your guidance.

What I want to do now is to perform simple mathematical operations within each table, basically by multiplying the values from two columns and creating a new column in each table . As mentioned above, I do not think this is possible to perform within the "for loop" in the code above. Therefore, I am thinking about creating a new function with "Table 2, Table 3 and Table 6" as input (if possible, directly connected to matchVals of 2,3 and 6). Moreover, the output I want is basically new versions of "Table 2, Table 3 and Table 6" which include a new column. However, the problem for me lies in creating a new function with the old "Table 2, Table 3 and Table 6" as input. I tried this segment of code (does not work unfortunately).

V = {Table_2,Table_3,Table_6}
%%
for k = 1:numel(V)
     [R_W_L(V,:)] = my_calc(V{k})
    
end

I get the error "A table row subscript must be a numeric array containing real positive integers, a logical array, a character vector, a string array, or a cell array of character vectors." When I look at the columns of the tables I see that they are doubles, e.g. 1000*1 double. I want to point out that each column has a variable name based on the code segment described at the top of this post (i.e. use of 2,3 or 6 in the column name). Do you think it is connected to this? Please let me know if I further explanation is needed.

Lastly, I am appending a screenshot of how Table 2 looks like in Matlab. It is the first and second columns I want to multiply and create a new column. When looking at the first column (1-2) it is possible to see that it consists of doubles as previously mentioned.

Jon 2022-1-6

在 MATLAB Online 中打开

If I understand you correctly, I think this illustates one way to do what you want. Note how incredibly awkward it is working with the data encoded in the table names and having to use the MATLAB eval. This is why it would really be better to modify your general approach on this. But for now I think this will work

% Make up some example data
T1 = array2table(rand(3,4));
T2 = array2table(rand(3,5));
T1.Properties.VariableNames = {'a','b','c','d'};
T2.Properties.VariableNames = {'v','w','x','y','z'};
matchVals = {'2','3','6'};
numTables = numel(matchVals);
% generate table names for future reference
tableNames = cell(numTables,1);
for k = 1:numTables
    tableNames{k} = ['Table_',matchVals{k}];
end
for k = 1:numTables
    % Make random column selection
    % Make sure at least two columns are selected
    idl1 = false(1,4);
    idl2 = false(1,5);
    while sum(idl1) < 2 || sum(idl2) < 2
        idl1 = logical(randi([0,1],[1,4]));
        idl2 = logical(randi([0,1],[1,5]));
    end
    eval([tableNames{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
% 
% Add new column to each table.
% New column is product of first two
% columns of original table
for k = 1:numTables
    eval(['T = ',tableNames{k},';']); % e.g. Table = Table_6
    % Compute product.
    % Note: use curly braces to extract numerical values from
    % the table, round parentheses would make a one column table, which
    % isn't what you want.
    % If you knew the column names it would be simpler.
    % For example if you knew the column names
    % were b and d you could just use T.myNewColumn = T.b .* T.d
    T.myNewColumn = T{:,1} .* T{:,2}; % note .* for element by element multiplication
    % put back into workspace with appropriate table name
    eval([tableNames{k},'= T']);
end

Vlatko Milic 2022-1-10

在 MATLAB Online 中打开

Of course. As performed by the code above, a new column is created in the original tables. See code:

% Make up some example data
T1 = array2table(rand(3,4));
T2 = array2table(rand(3,5));
T1.Properties.VariableNames = {'a','b','c','d'};
T2.Properties.VariableNames = {'v','w','x','y','z'};
matchVals = {'2','3','6'};
numTables = numel(matchVals);
% generate table names for future reference
tableNames = cell(numTables,1);
for k = 1:numTables
    tableNames{k} = ['Table_',matchVals{k}];
end
for k = 1:numTables
    % Make random column selection
    % Make sure at least two columns are selected
    idl1 = false(1,4);
    idl2 = false(1,5);
    while sum(idl1) < 2 || sum(idl2) < 2
        idl1 = logical(randi([0,1],[1,4]));
        idl2 = logical(randi([0,1],[1,5]));
    end
    eval([tableNames{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
% 
% Add new column to each table.
% New column is product of first two
% columns of original table
for k = 1:numTables
    eval(['T = ',tableNames{k},';']); % e.g. Table = Table_6
    % Compute product.
    % Note: use curly braces to extract numerical values from
    % the table, round parentheses would make a one column table, which
    % isn't what you want.
    % If you knew the column names it would be simpler.
    % For example if you knew the column names
    % were b and d you could just use T.myNewColumn = T.b .* T.d
    T.myNewColumn = T{:,1} .* T{:,2}; % note .* for element by element multiplication
    % put back into workspace with appropriate table name
    eval([tableNames{k},'= T']);
end

Let's say that I wanted to make a new table which only includes the new column or columns, and not the entire data which is stored in the original tables. Can I include this in the last for loop of the code, i.e.:

for k = 1:numTables
    eval(['T = ',tableNames{k},';']); % e.g. Table = Table_6
    % Compute product.
    % Note: use curly braces to extract numerical values from
    % the table, round parentheses would make a one column table, which
    % isn't what you want.
    % If you knew the column names it would be simpler.
    % For example if you knew the column names
    % were b and d you could just use T.myNewColumn = T.b .* T.d
    T.myNewColumn = T{:,1} .* T{:,2}; % note .* for element by element multiplication
    % put back into workspace with appropriate table name
    eval([tableNames{k},'= T']);
end

Or do I have to make a new function for this after the loop? I hope this made the question more clear.

Stephen23 2022-1-11

"However, the code cannot recognize my created columns when I do like this. I get the error ..."

Because your code design is liable to bugs and makes debugging much harder.

You are going to spend a lot of time fighting the otherwise trivial task of just trying to access your data.

Perhaps it is slowly time to learn the EVAL-sized lesson here: read the links we gave you and understand some of the reasons why the makers of MATLAB and all experienced MATLAB users advise against your approach to writing code.

Vlatko Milic 2022-1-11

I will do so Stephen, Jon appended some links before also. Thanks

请先登录，再进行评论。

Merge columns in two tables depending on column names

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

31 个评论
显示 29更早的评论隐藏 29更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Merge columns in two tables depending on column names

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

采纳的回答

31 个评论 显示 29更早的评论隐藏 29更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

31 个评论
显示 29更早的评论隐藏 29更早的评论