I want to avoid dynamically naming my tables after a join.

4 次查看(过去 30 天)
I certianly have a lot to learn, but I do know I am not supposed to dynamically create variable names- and I assume that applies to table names as well.
My question is twofold:
  1. How do I get out of the situation I am in- the real tables are large and very complicated with all kinds of issues and my process is actually working.
  2. I definitelty would also like to know what I can do better next time to approach this so I don't end up where I am now :)
My current situation has me creating new tables for export to another application based on an initial filter step. I am going to do many iterations of the filter, then use the filtered tables to join with other tables and on and on, and I'd like the final tables and exported files to be named somehow that I and other humans can easily recognize them (ie descriptive terms like 'Steve' and 'Jim'.)
My code is something like this:
T = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"; "DDD"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11');...
datetime('2023-03-01 02:00:11')], ...
[9999; 12; 34; 1111; 2222; 3333; 5556], ...
'VariableNames', {'var1', 'var2', 'var3'});
OtherTable = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11')], ...
[9999; 12; 34; 1111; 2222; 3333], ...
'VariableNames', {'var1', 'var2', 'var3'});
%% now I want to group and filter the data
onlyA=T(T.var1=='AAA',:)
onlyA = 1×3 table
var1 var2 var3 _____ ____________________ ____ "AAA" 07-Jul-2023 10:55:19 9999
BandC=T(T.var1=='BBB' | T.var1=='CCC',:)
BandC = 5×3 table
var1 var2 var3 _____ ____________________ ____ "BBB" 28-Jul-2023 09:31:00 12 "BBB" 28-Jul-2023 14:02:11 34 "CCC" 23-Jan-2023 17:35:24 1111 "CCC" 23-Jan-2023 17:31:48 2222 "CCC" 12-Feb-2023 03:10:11 3333
CinFeb23=T(T.var1=='CCC' & T.var2=='2023-02-12 03:10:11',:)
CinFeb23 = 1×3 table
var1 var2 var3 _____ ____________________ ____ "CCC" 12-Feb-2023 03:10:11 3333
and I have lots of these groupings already written out and they don't seem to lend themselves to automation.
Then I go on to a join and export (lots more data wrangling steps of course):
%% This section needs to loop (or something) over all of the above filtered/grouped tables
% (onlyA, BandC, CinFeb23) etc. - I need to findd and replace onlyA
% essentially.
onlyA_joined=innerjoin(onlyA,OtherTable,"LeftKeys",["var1"],"RightKeys",["var1"]);
onlyA_joined.Properties.VariableNames = {'onlyA_name','onlyA_date','onlyA_value',...
'onlyA_date_again','onlyA_value_again'};
onlyA_joined
onlyA_joined = 1×5 table
typeA_name typeA_date typeA_value typeA_date_again typeA_value_again __________ ____________________ ___________ ____________________ _________________ "AAA" 07-Jul-2023 10:55:19 9999 07-Jul-2023 10:55:19 9999
writetable(onlyA_joined,'final_file.xlsx','sheet','onlyA');
% now I want to do it all over again, except have 'BandC', and then 'CinFeb23' replace all the
% places where 'onlyA' is used as a name.
Lots more steps involved, but I'd like to somehow not have to go in and manually change 'onlyA' every single time as I have a lot of iterations.
My guess is I want to loop and use a cell of strings with the names etc. but not sure how to handle the differewnt cases such as table name, variable name, sheet name etc.
Thansk for any insight!
  6 个评论
dpb
dpb 2023-10-18
If your task is important to the organization, then it should be able to be escalated to the level needed...I guess the risk in that is them deciding it isn't that important, after all! (In a prior life I was known for being the one who raised issues nobody knew existed before :) )
Stephen23
Stephen23 2023-10-19
编辑:Stephen23 2023-10-19
Some comments to your two specific questions:
  1. Use arrays. MATLAB is designed around arrays, so rather than storing lots of separate variables you use arrays and indexing. When faced with any challenge, always as yourself "how can I do this with an array?". E.g. use cell arrays and structure arrays to store other arrays.
  2. Understand that meta-data is data, and data belongs in variables (not in variable names). Once you start forcing meta-data into variable names (e.g. "onlyA", "BandC") then you have already started down a dead-end road to a place you really don't want to be at: there be dragons!
Follow these very simple principles and you will find that your code becomes more generalizable and expandable: i.e. rather than painfully copy-and-pasting-and-modifying code (having copies of code is a sign something is wrong) you use e.g. loops... and then (much as Voss showed) processing more than just three filter criteria does not require more copy-and-pasting but simply expanding one single array to add some new criteria (which the rest of your code automatically processes).

请先登录,再进行评论。

采纳的回答

Voss
Voss 2023-10-18
编辑:Voss 2023-10-18
Maybe something like this will help:
T = table(["AAA"; "BB"; "BBB"; "CCC"; "CCC"; "CCC"; "DDD"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11');...
datetime('2023-03-01 02:00:11')], ...
[9999; 12; 34; 1111; 2222; 3333; 5556], ...
'VariableNames', {'var1', 'var2', 'var3'})
T = 7×3 table
var1 var2 var3 _____ ____________________ ____ "AAA" 07-Jul-2023 10:55:19 9999 "BB" 28-Jul-2023 09:31:00 12 "BBB" 28-Jul-2023 14:02:11 34 "CCC" 23-Jan-2023 17:35:24 1111 "CCC" 23-Jan-2023 17:31:48 2222 "CCC" 12-Feb-2023 03:10:11 3333 "DDD" 01-Mar-2023 02:00:11 5556
OtherTable = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11')], ...
[9999; 12; 34; 1111; 2222; 3333], ...
'VariableNames', {'var1', 'var2', 'var3'})
OtherTable = 6×3 table
var1 var2 var3 _____ ____________________ ____ "AAA" 07-Jul-2023 10:55:19 9999 "BBB" 28-Jul-2023 09:31:00 12 "BBB" 28-Jul-2023 14:02:11 34 "CCC" 23-Jan-2023 17:35:24 1111 "CCC" 23-Jan-2023 17:31:48 2222 "CCC" 12-Feb-2023 03:10:11 3333
% define the filters you want to use:
filters = struct( ...
"onlyA", T.var1=='AAA', ...
"BandC", T.var1=='BBB' | T.var1=='CCC', ...
"CinFeb23", T.var1=='CCC' & T.var2=='2023-02-12 03:10:11')
filters = struct with fields:
onlyA: [7×1 logical] BandC: [7×1 logical] CinFeb23: [7×1 logical]
% apply each filter to T, join the result with OtherTable, and
% write the joined table to a sheet in the output file:
filter_names = fieldnames(filters);
var_suffixes = {'_name','_date','_value','_date_again','_value_again'};
output_file = 'final_file.xlsx';
for ii = 1:numel(filter_names)
filtered_T = T(filters.(filter_names{ii}),:);
joined_T = innerjoin(filtered_T,OtherTable,"LeftKeys",["var1"],"RightKeys",["var1"]);
joined_T.Properties.VariableNames = strcat(filter_names{ii},var_suffixes);
writetable(joined_T,output_file,'sheet',filter_names{ii});
end
% check the output file:
sheet_names = sheetnames(output_file);
for ii = 1:numel(sheet_names)
disp(sheet_names(ii) + ":");
readtable(output_file,'Sheet',sheet_names(ii))
end
onlyA:
ans = 1×5 table
onlyA_name onlyA_date onlyA_value onlyA_date_again onlyA_value_again __________ ____________________ ___________ ____________________ _________________ {'AAA'} 07-Jul-2023 10:55:19 9999 07-Jul-2023 10:55:19 9999
BandC:
ans = 11×5 table
BandC_name BandC_date BandC_value BandC_date_again BandC_value_again __________ ____________________ ___________ ____________________ _________________ {'BBB'} 28-Jul-2023 14:02:11 34 28-Jul-2023 09:31:00 12 {'BBB'} 28-Jul-2023 14:02:11 34 28-Jul-2023 14:02:11 34 {'CCC'} 23-Jan-2023 17:35:24 1111 23-Jan-2023 17:35:24 1111 {'CCC'} 23-Jan-2023 17:35:24 1111 23-Jan-2023 17:31:48 2222 {'CCC'} 23-Jan-2023 17:35:24 1111 12-Feb-2023 03:10:11 3333 {'CCC'} 23-Jan-2023 17:31:48 2222 23-Jan-2023 17:35:24 1111 {'CCC'} 23-Jan-2023 17:31:48 2222 23-Jan-2023 17:31:48 2222 {'CCC'} 23-Jan-2023 17:31:48 2222 12-Feb-2023 03:10:11 3333 {'CCC'} 12-Feb-2023 03:10:11 3333 23-Jan-2023 17:35:24 1111 {'CCC'} 12-Feb-2023 03:10:11 3333 23-Jan-2023 17:31:48 2222 {'CCC'} 12-Feb-2023 03:10:11 3333 12-Feb-2023 03:10:11 3333
CinFeb23:
ans = 3×5 table
CinFeb23_name CinFeb23_date CinFeb23_value CinFeb23_date_again CinFeb23_value_again _____________ ____________________ ______________ ____________________ ____________________ {'CCC'} 12-Feb-2023 03:10:11 3333 23-Jan-2023 17:35:24 1111 {'CCC'} 12-Feb-2023 03:10:11 3333 23-Jan-2023 17:31:48 2222 {'CCC'} 12-Feb-2023 03:10:11 3333 12-Feb-2023 03:10:11 3333
  2 个评论
dpb
dpb 2023-10-18
"Maybe something like this will help:"
Precisely!!! the lesson I've been preaching--separate data from code.
Marcus Glover
Marcus Glover 2023-10-18
编辑:Marcus Glover 2023-10-18
Thanks, this helps a lot. I had read about struct as a possible solution to the problem and this works well. I should be able to incorporate it.
Never worked with them, and I am I am having a strange issue with adding an if loop based on the fieldnames- probably simple fix (yes it was, fixed myself :)
Thanks again!!!
filters = struct( ...
"onlyA", 'stuff', ...
"BandC", 'morestuff', ...
"CinFeb23", 'everything_else');
filter_names = fieldnames(filters);
% for ii = 1:numel(filter_names)
% if filter_names{ii} == 'BandC'
% disp('working on BandC')
% else
% disp('not working on BandC')
% end
% end
%% this is how you do it
for ii = 1:numel(filter_names)
if matches(filter_names{ii},["onlyA","CinFeb23"])
disp('not working on BandC')
else
disp('working on BandC')
end
end
not working on BandC
working on BandC
not working on BandC

请先登录,再进行评论。

更多回答(0 个)

类别

Help CenterFile Exchange 中查找有关 Images 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by