Merge columns in two tables depending on column names
6 次查看(过去 30 天)
显示 更早的评论
Hi,
I have two tables consisting of six and 21 columns. I want to create three new tables based on the column names in the aforementioned tables. This I want to perform depending on the 10th and 12th position in the first table with six columns (see Mat_question-file). The 10th and 12th position contains values of 1/2, 3/4 or 5/6. These values (i.e. 1/2, 3/4 or 5/6) in the first table, I want to merge with the columns in table no.2 based on position 7. Position 7 in the second table (21 columns) contains the values of 2,3 or 6 ( see Mat_question2-file). I want to merge all the columns in the second table with the same value, i.e. 2,3 and 6, on the 7th position. Thereafter, I want to to merge the already merged columns from the second table with the first table based on if the 10th or 12th value in each column (table 1) contains a 1 or 2, a 3 or 4, or a 5 or 6. Hence, as an example, the columns in table 2 that contain a 3 on the 7th position should be connected with the columns in table 1 that contain a 3 on the 10th or 12th position. Based on this I should get three new tables with the conditions described above.
Quite a messy problem. Happy to elaborate.
The files are attached in the question.
Thanks in advance MatLab-friends
1 个评论
Alex Normanie
2022-1-6
Following - I need to learn how to use conditions for some merging operations too! thanks for posting
采纳的回答
Jon
2021-12-8
I think you can probably accomplish what you want using one or more of MATLAB's join, innerjoin, and outerjoin commands.
Please look at the documentation for those, try to code up as far as you can get and if you are still stuck post for more help.
31 个评论
Vlatko Milic
2021-12-8
I was thinking of making a "for" and an "if" statement based on my conditions. However, I noticed that I cannot have the same variable names in different columns due to Matlab. I was thinking of a code similar to the one below (please see comment also in the code). Here I have problem of investigating a certain position, e.g. position 5 in the variable name, and not the entire variable name. As mentioned above, Matlab does not allow the same variable name for multiple columns.
Variable_Names = Test_table.Properties.VariableNames;
columns = width(Test_table);
j = 1;
for i = 1:columns
if contains(Variable_Names{i}, '2') %I do not know how to specify that I want to extract this value at a certain position, but only for the entire variable name...
new_table(j) = Test_table(:,i);
new_table.Properties.VariableNames{j} = Variable_Names{i};
j = j + 1;
end
end
I would like to make this procedure automatic since it will be used multiple times, hence the use of the for and if statement
Thank you
Jon
2021-12-9
编辑:Jon
2021-12-9
I think this will get you what you want or at least will be close. Important note, I had to edit your Mat_question2.xlsx file and put in a column name for the last column. Otherwise it didn't read the other column names correctly. I named it 'lastColumn' but you could call it anything as long as it doesn't have any numbers in the name.
% Read the data from input files into tables
T1 = readtable('Mat_question.xlsx','VariableNamingRule','preserve')
T2 = readtable('Mat_question2.xlsx','VariableNamingRule','preserve')
% Get the column names
names1 = T1.Properties.VariableNames;
names2 = T2.Properties.VariableNames;
% Loop through matching characters building new tables
matchVals = {'2','3','6'};
numTables = numel(matchVals); % the number of new tables to be created
% make new tables based upon specific characters found in column names
newTables = cell(numTables,1); % make cell array to hold new tables
for k = 1:numel(matchVals)
% First make new table by selecting out columns from table 2
% use cellfun to go through each element of names2 and make a logical
% vector whose elements are true where the desired characters match
idl2 = cellfun(@(x) strcmp(x(7),matchVals{k}),names2);
% Similarly, find columns in table 1 to include in new table
idl1 = cellfun(@(x) strcmp(x(11),matchVals{k})...
||strcmp(x(13),matchVals{k}),names1);
% add the matching columns to the new table
newTables{k} = [T1(:,idl1) T2(:,idl2)];
end
Jon
2021-12-9
Oh one other thing. In my original response I suggested using MATLAB's join, innerjoin and outerjoin. At that time I didn't really understand your question. Those functions are very useful for creating new tables based on criteria involving the actual contents of the table column. Here you are just basing your construction upon the names of the columns. I would say in general, in the future, if you have any control over how the data is stored it is better not to encode too much in the column names. As you can see from this example that is quite awkward. It is better to have very simple column names and keep the data in the contents of the columns. So "stack" the data. This is shown in the attached file, Mat_question2_stacked, where I have stacked your Mat_question2.xlsx file. You can now easily do operations by testing on the column values, rather than the names of the columns.
Vlatko Milic
2021-12-10
Thank you for the answer.
Unfortunately, I get the error:
Index exceeds the number of array elements (5).
Error in Main_code (line 74)
idl1 = cellfun(@(x) strcmp(x(11),matchVals{k})...
Error in Main_code (line 74)
idl1 = cellfun(@(x) strcmp(x(11),matchVals{k}).
I do not really understand the use of strcmp(x(11)) and cellfun(@(x). From where does the 11 and x come from? Can the error be connected to this?
Thank you for the clarification with regard to the use of "join".
Kind regards
Jon
2021-12-10
编辑:Jon
2021-12-10
The code I provided runs without error for me using your two example files, 'Mat_question.xlsx','Mat_question2.xlsx'. So I'm assuming the error you are getting results when running on different files. If you could provide a self contained example (code and input files) that demonstrates the problem I could help you debug it further.
I'll also try to explain a little better how the code I gave works, and that might help you adapt what you are doing to your situation. Let's look at this line of code:
idl1 = cellfun(@(x) strcmp(x(11),matchVals{k})...
||strcmp(x(13),matchVals{k}),names1);
First there is the concept of a cellfun. A cellfun loops through every element of a cell array applying the supplied function to each element of the cell arrray. So in this case it applies the same function to every element of names1, the cell array of column names. The function that it applies to each element is defined by the anonymous function
@(x) strcmp(x(11),matchVals{k})...
||strcmp(x(13),matchVals{k})
The argument for this function, x, is an element of names1, so a column name. The variables, x(11) and x(13) are the 11th and 13th characters in the column name. These are each compared to matchVals{k}, which for a given loop has the values '2', '3' or '6'. The function strcmp compares the character in question with matchVals{k} and returns a true if it matches. So the expression
strcmp(x(11),matchVals{k})||strcmp(x(13),matchVals{k}
will be true if either the 11th character of the column name or the 13th character of the column name matches the current value of matchVals{k}.
So let's say we are on the first loop, with k = 1, matchVals{1} will equal '2', and the expression will be true for each column name where the 11th or 13th character is a 2.
Since the cellfun applies this to each element in the names1 cellarray the result is a logical vector,idl1, of 0's and 1's (false and true) indicating for each column name whether the 11th or 13th character is a 2. So stopping in the debugger at the end of the first loop, with k = 1 we have
K>> names1,matchVals,k, matchVals{k},idl1
names1 =
1×6 cell array
Columns 1 through 4
{'XXXXXXX M 1-2 V…'} {'XXXXXXX M 1-2 V…'} {'XXXXXXX M 3-4 V…'} {'XXXXXXX M 3-4 V…'}
Columns 5 through 6
{'XXXXXXX M 5-6 V…'} {'XXXXXXX M 5-6 V…'}
matchVals =
1×3 cell array
{'2'} {'3'} {'6'}
k =
1
ans =
'2'
idl1 =
1×6 logical array
1 1 0 0 0 0
The main loop, which increments k then repeats this for the next value to be tested matchVals{2}='3' etc.
I hope this helps you understand better how the code works.
Vlatko Milic
2021-12-13
Wow, thanks for the detailed explanation. Very good explanation. I tried it again with your code and it worked. However, the input files I am using are exactly the same as the ones I uploaded here but with 90 000 rows. When I try the code on the large data set I get the error message mentioned, i.e.: Index exceeds the number of array elements (5). But this should not be connected to the number of rows (as I understand it). Moreover, when I read the input file according to the math-files uploades we get a 25*6 and a 25*21 table. The only difference for the large data set I am using is that it rows are 90 000 and not 25. Could it be connected to the rows? I think it is the variable names which are not correct. I will double check this and see If I make it work. Thanks again.
Vlatko Milic
2021-12-13
I made it work, thank you again. Just a short comment. Is it possible to name the three tables with 2', '3' or '6 depending on what integer is representative for each table? Hence, it is the "matchVals = {'2','3','6'}" that I want to name the three tables after the loop.
Thank you.
Jon
2021-12-13
Hi Vlatko,
Glad to hear you were able to get your program working.
Actually it is really better not to name your tables for example Table2, Table3 and Table6 encoding the information about what is in the tables with the table name. You will run into the same kind of issues you did when the column names for your varaible held that kind of information.
Here is a good reference that explains in detail why this practice is discouraged https://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval
So in the above code, leaving the individual tables in the cellarray of tables newTables, you could select a particular table, corresponding to the characters '2', '3', or '6' using indexing. For example to find the table corresponding to '2' you could use
T = newTables(strcmp('2',matchVals))
If the matching criteria is always numeric, you might want to make a numeric array to hold your matching criteria, so
matchNums = [2,3,6]
% to find the table that had data corresponding to 3,
T = newTables(3 == matchNums)
That being said, if having read all of the above, you really think it is better for your purposes to dynamically name the variables, MATLAB will let you do it using the eval function. For example within the main loop instead of using
% add the matching columns to the new table
newTables{k} = [T1(:,idl1) T2(:,idl2)];
you could use
eval(['Table_',matchVals{k},' = [T1(:,idl1) T2(:,idl2)]' ])
which would make tables named Table_2, Table_3, Table_6
I really would not recommend this approach though.
Vlatko Milic
2021-12-14
Thank you for the detailed answer Jon. The only problem I have is that I am working with many tables, more than the ones presented here. Therefore, I am thinking about the best possible solution. And I see the problems with dynamically naming the table names according to your answer. This is something I have to evaluate with time. I will probably need to access the tables once only and thereafter perform post-processing using the data within the tables.
I would still like to try the dynamic naming and thereafter make a choise what is the best solution depending on how many tables I will be working with. I tried you suggestion. However, I get the error "Invalid use of operator" on the line where I changed the code. Can this be connected to the purple colored letters to the right of the '=' sign. I am wondering if the ' is correctly placed because of this?
Thank you again. You are extremely helpful and I learned a lot
Jon
2021-12-14
The purple font color just highlights the quoted string. Here's a little standalone example which uses this same line of code without errors. So maybe you are doing something else wrong. Maybe attach a short example that shows the problem if you are still stuck
T1 = array2table(rand(3,4))
T2 = array2table(rand(3,5))
T1.Properties.VariableNames = {'a','b','c','d'}
T2.Properties.VariableNames = {'v','w','x','y','z'}
idl1 = [true false true false]
idl2 = [true true false true false]
matchVals = {'2','3','6'};
for k = 1:3
eval(['Table_',matchVals{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
Jon
2021-12-14
编辑:Jon
2021-12-14
Note, in the above example all of the tables Table_2, Table_3 and Table_6 are the same. So this isn't a very realisitic example. I just wanted to show that the dynamic assignment worked.
Here is another way to do it. Still frowned upon, but maybe a little nicer than using eval. Instead use a structure to hold the tables with dynamically named fields. So the idea is create for example a structure called Tables and give it fields t_2, t_3, t_6 etc. You could then access for example a particular table as Tables.t_2
Here's an example showing this. To make it a little more realistice here I made all of the tables different, by randomly selecting columns that are included in each table. Not really important, mostly just look at the last few of lines of code to see the dynamically assigned structure
% make up some data for the example
T1 = array2table(rand(4,4))
T2 = array2table(rand(4,5))
T1.Properties.VariableNames = {'a','b','c','d'}
T2.Properties.VariableNames = {'v','w','x','y','z'}
idl1 = logical(randi(1,[3,4]));
idl2 = logical(randi(1,[3,5]));
matchVals = {'2','3','6'};
% make structure with dynamically named fields to hold different tables
% so, Tables.t_2, Tables.t_3, etc.
Tables = struct; % preallocate structure to hold tables
for k = 1:3
fieldName = ['t_',matchVals{k}]; % generate field name of form t_2, t_3 etc.
Tables.(fieldName) = [T1(:,idl1(k,:)) T2(:,idl2(k,:))]
end
Vlatko Milic
2021-12-14
thank you so much. I tried the code and it works perfectly! This example gave me more clarification in order perform this in a nicer was, as you also mentioned. And this should works on tables which are different also.
All the best Jon!
Vlatko Milic
2022-1-5
Hi again Jon,
Hope you are doing well.
I was working a bit with the code shown above. The output is three tables which are named "Table 2, Table 3 and Table 6" ( see matchVals). This was achieved using the code you suggested, i.e.
for k = 1:3
eval(['Table_',matchVals{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
I need to add a column in each table (by basically multiplying two values). I wonder if this is possible to perform directly in the code presented above. From my perspective it is very difficult. However, do you have any advice on how to generate a function which considers only ""Table 2, Table 3 and Table 6" as input (if possible, directly connected to matchVals of 2,3 and 6). Hence, I want to specify that I only want to use the output from the code above (Table 2, Table 3 and Table 6) as input to the new function, and thereafter to perform a simple mathematical operation.
Thank you in advance.
Kind regards
Vlatko Milic
2022-1-5
编辑:Vlatko Milic
2022-1-6
I tried something in accordance to the text below:
V = {Table_2,Table_3,Table_6}
%%
for k = 1:numel(V)
[R_W_L(V,:)] = my_calc(V{k})
end
However, I get the error "A table row subscript must be a numeric array containing real positive integers, a logical array, a character vector, a string array, or a cell array of character vectors." When I look at the columns of the tables I see that they doubles, e.g. 1000*1 double. Do you think it is connected to this?
Jon
2022-1-5
编辑:Jon
2022-1-5
The original problem is quite complicated, and I haven't looked at it for awhile. It would take me some time just to get back to understanding all of the details. If you could possibly make just a simple, small, standalone example of what you were trying to do, and what errors you get that would be really helpful.
Vlatko Milic
2022-1-6
编辑:Vlatko Milic
2022-1-6
Of course. I had a number of tables titled with certain integers. I wanted to join these tables based on similarities in the table names, such as if the tables have a "2" in the table name. This was performed successfully using the code:
T1 = array2table(rand(4,4))
T2 = array2table(rand(4,5))
T1.Properties.VariableNames = {'a','b','c','d'}
T2.Properties.VariableNames = {'v','w','x','y','z'}
idl1 = logical(randi(1,[3,4]));
idl2 = logical(randi(1,[3,5]));
matchVals = {'2','3','6'};
for k = 1:3
eval(['Table_',matchVals{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
As an output from the code above I get three tables titled "Table 2", "Table 3" and "Table 6". Until this point everything is working superb with your guidance.
What I want to do now is to perform simple mathematical operations within each table, basically by multiplying the values from two columns and creating a new column in each table . As mentioned above, I do not think this is possible to perform within the "for loop" in the code above. Therefore, I am thinking about creating a new function with "Table 2, Table 3 and Table 6" as input (if possible, directly connected to matchVals of 2,3 and 6). Moreover, the output I want is basically new versions of "Table 2, Table 3 and Table 6" which include a new column. However, the problem for me lies in creating a new function with the old "Table 2, Table 3 and Table 6" as input. I tried this segment of code (does not work unfortunately).
V = {Table_2,Table_3,Table_6}
%%
for k = 1:numel(V)
[R_W_L(V,:)] = my_calc(V{k})
end
I get the error "A table row subscript must be a numeric array containing real positive integers, a logical array, a character vector, a string array, or a cell array of character vectors." When I look at the columns of the tables I see that they are doubles, e.g. 1000*1 double. I want to point out that each column has a variable name based on the code segment described at the top of this post (i.e. use of 2,3 or 6 in the column name). Do you think it is connected to this? Please let me know if I further explanation is needed.
Lastly, I am appending a screenshot of how Table 2 looks like in Matlab. It is the first and second columns I want to multiply and create a new column. When looking at the first column (1-2) it is possible to see that it consists of doubles as previously mentioned.
Jon
2022-1-6
If I understand you correctly, I think this illustates one way to do what you want. Note how incredibly awkward it is working with the data encoded in the table names and having to use the MATLAB eval. This is why it would really be better to modify your general approach on this. But for now I think this will work
% Make up some example data
T1 = array2table(rand(3,4));
T2 = array2table(rand(3,5));
T1.Properties.VariableNames = {'a','b','c','d'};
T2.Properties.VariableNames = {'v','w','x','y','z'};
matchVals = {'2','3','6'};
numTables = numel(matchVals);
% generate table names for future reference
tableNames = cell(numTables,1);
for k = 1:numTables
tableNames{k} = ['Table_',matchVals{k}];
end
for k = 1:numTables
% Make random column selection
% Make sure at least two columns are selected
idl1 = false(1,4);
idl2 = false(1,5);
while sum(idl1) < 2 || sum(idl2) < 2
idl1 = logical(randi([0,1],[1,4]));
idl2 = logical(randi([0,1],[1,5]));
end
eval([tableNames{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
%
% Add new column to each table.
% New column is product of first two
% columns of original table
for k = 1:numTables
eval(['T = ',tableNames{k},';']); % e.g. Table = Table_6
% Compute product.
% Note: use curly braces to extract numerical values from
% the table, round parentheses would make a one column table, which
% isn't what you want.
% If you knew the column names it would be simpler.
% For example if you knew the column names
% were b and d you could just use T.myNewColumn = T.b .* T.d
T.myNewColumn = T{:,1} .* T{:,2}; % note .* for element by element multiplication
% put back into workspace with appropriate table name
eval([tableNames{k},'= T']);
end
Vlatko Milic
2022-1-7
Wow, I made it to work now! Thanks to you! I was struggling a bit to apply it on my code but I learned a lot...
I agree with you about the complexity of the code. I will sure look into the solution you were proposing earlier. This will be especially more appropriate when working with more tables in future. Thank you for all the advice!
One short question - If I would like to create new tables based on the created column (I am creating more than one column in my script) from the script above (and not the entire content from T1 and T2 that was included in the beginning), is this possible to perform within the last for loop in the script? Or do I need to create a separate function for this?
Jon
2022-1-7
Glad you got that part working.
I'm sorry. I'm not understanding your last "One short question". Could you please clarify further?
Vlatko Milic
2022-1-10
Of course. As performed by the code above, a new column is created in the original tables. See code:
% Make up some example data
T1 = array2table(rand(3,4));
T2 = array2table(rand(3,5));
T1.Properties.VariableNames = {'a','b','c','d'};
T2.Properties.VariableNames = {'v','w','x','y','z'};
matchVals = {'2','3','6'};
numTables = numel(matchVals);
% generate table names for future reference
tableNames = cell(numTables,1);
for k = 1:numTables
tableNames{k} = ['Table_',matchVals{k}];
end
for k = 1:numTables
% Make random column selection
% Make sure at least two columns are selected
idl1 = false(1,4);
idl2 = false(1,5);
while sum(idl1) < 2 || sum(idl2) < 2
idl1 = logical(randi([0,1],[1,4]));
idl2 = logical(randi([0,1],[1,5]));
end
eval([tableNames{k},' = [T1(:,idl1) T2(:,idl2)]' ])
end
%
% Add new column to each table.
% New column is product of first two
% columns of original table
for k = 1:numTables
eval(['T = ',tableNames{k},';']); % e.g. Table = Table_6
% Compute product.
% Note: use curly braces to extract numerical values from
% the table, round parentheses would make a one column table, which
% isn't what you want.
% If you knew the column names it would be simpler.
% For example if you knew the column names
% were b and d you could just use T.myNewColumn = T.b .* T.d
T.myNewColumn = T{:,1} .* T{:,2}; % note .* for element by element multiplication
% put back into workspace with appropriate table name
eval([tableNames{k},'= T']);
end
Let's say that I wanted to make a new table which only includes the new column or columns, and not the entire data which is stored in the original tables. Can I include this in the last for loop of the code, i.e.:
for k = 1:numTables
eval(['T = ',tableNames{k},';']); % e.g. Table = Table_6
% Compute product.
% Note: use curly braces to extract numerical values from
% the table, round parentheses would make a one column table, which
% isn't what you want.
% If you knew the column names it would be simpler.
% For example if you knew the column names
% were b and d you could just use T.myNewColumn = T.b .* T.d
T.myNewColumn = T{:,1} .* T{:,2}; % note .* for element by element multiplication
% put back into workspace with appropriate table name
eval([tableNames{k},'= T']);
end
Or do I have to make a new function for this after the loop? I hope this made the question more clear.
Stephen23
2022-1-10
编辑:Stephen23
2022-1-10
"I was struggling a bit to apply it on my code but I learned a lot..."
Even better to learn: https://www.mathworks.com/help/matlab/matlab_prog/string-evaluation.html
Meta-data is data, and should be stored in variables, not in variable names. Instead of awkwardly forcing a pseudo-index into a variable name, code is much simpler and more efficient using actual indices.
@Jon: this is not a comment on your advice in this thread, which I think is thorough, balanced, and exemplary. My comment here is just to ... emphasize the point again.. and again... and again :)
Jon
2022-1-10
Actually I earlier provided a link to the OP of your nice article https://www.mathworks.com/matlabcentral/answers/304528-tutorial-why-variables-should-not-be-named-dynamically-eval
Jon
2022-1-10
编辑:Jon
2022-1-10
Vlatko, back on your "short question". Yes you could certainly save just the last two columns in a new table. You will need to generate the appropriate name for your new tables, lets say you this new list of names, newTableNamesand then just substitute that at this point
% put back into workspace with appropriate table name
eval([tableNames{k},'= T']);
I would really try to take @Stephen's advice and now figure out how to avoid encoding data in you variable names, and put the data into your tables instead. I think it will probably be easier to do this than you think, once you start thinking this new way about it. You can then really apply the full power of MATLAB for searching and extracting what ever parts you want out of these tables without dealing with all of this awful eval stuff.
Vlatko Milic
2022-1-10
thanks to both u and Stephen on the reply. And I will try to figure out how to put the data directly in the tables, instead of in the variable names.
Just a short follow up question - I was thinking of including
newtableNames = cell(numTables,1);
%
prior to the loop. And just include a code in the loop according to:
eval(['newtablenames_',tableName{v},' = [Table_(:,Column1) Table_(:,Column2)]' ]);%these columns are created in the loop we were discussing above
However, the code cannot recognize my created columns when I do like this. I get the error "Error using eval
Unrecognized function or variable 'Column...."
Jon
2022-1-10
Sorry, it's getting to hard to follow the details of all of this. I think you should have all of the building blocks now. Try to understand what all of the code you already have is really doing, and I think you will be able to see how to appropriately modify it to get what you want.
If you really get stuck, please try to make a simple example that demonstrates the problem and maybe I can see what is causing the errors.
I think at this point it would be better to put your efforts into organizing your data better into useful tables, than to continue to just keep doing one more thing with your current approach.
Stephen23
2022-1-11
"However, the code cannot recognize my created columns when I do like this. I get the error ..."
Because your code design is liable to bugs and makes debugging much harder.
You are going to spend a lot of time fighting the otherwise trivial task of just trying to access your data.
Perhaps it is slowly time to learn the EVAL-sized lesson here: read the links we gave you and understand some of the reasons why the makers of MATLAB and all experienced MATLAB users advise against your approach to writing code.
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Text Files 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!发生错误
由于页面发生更改,无法完成操作。请重新加载页面以查看其更新后的状态。
您也可以从以下列表中选择网站:
如何获得最佳网站性能
选择中国网站(中文或英文)以获得最佳网站性能。其他 MathWorks 国家/地区网站并未针对您所在位置的访问进行优化。
美洲
- América Latina (Español)
- Canada (English)
- United States (English)
欧洲
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
亚太
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)