How to rename identical variables under one common name?

1 次查看(过去 30 天)
I've read in a Excel file through MATLAB that is a 1191x12 Table.
The goals is to rename identical variables to one common variable name, Example below:
For any type that start with ABCD, Id like to rename to only ABCD removing the following characters/numbers.
For any type that starts with BACA, Id like to rename to only BACA removing the following characters/numbers.
For any type that starts with CABD, Id like to rename to only CABD removing the following characters/numbers.
For any type that starts with DABC, Id like to rename to only CABD removing the following characters/numbers.
Current Table Example below:
Expected Table Below:
  4 个评论
Matt J
Matt J 2023-7-6
It is better to attach your examples as .mat files with actual data variables in them.
Rookie Programmer
Thanks, the code below worked for this issue.
numrows = height(Table)
NewType = table
NewType.Types = cell(numrows,1)
for IdxR = 1:numrows
if contains(table.type{IdxR}, 'ABCD')
NewType.Types{IdxR} = 'ABCD'
elseif contains(table.type{IdxR}, 'BACA')
NewType.Types{IdxR} = 'BACA'
elseif contains(table.type{IdxR}, 'CABD')
NewType.Types{IdxR} = 'CABD'
elseif contains(table.type{IdxR}, 'DABC')
NewType.Types{IdxR} = 'DABC'
end
end
Table.Type = NewType.Types;

请先登录,再进行评论。

采纳的回答

Stephen23
Stephen23 2023-7-7
编辑:Stephen23 2023-7-7
T = readtable('myData.xlsx')
T = 4×3 table
Place Day Type ______ ___ ___________ {'NC'} 1 {'ABCD123'} {'NY'} 2 {'BACA654'} {'TX'} 3 {'CABD154'} {'WV'} 4 {'DABC987'}
T.Type = regexp(T.Type,'^[A-Z]{4}','match','once')
T = 4×3 table
Place Day Type ______ ___ ________ {'NC'} 1 {'ABCD'} {'NY'} 2 {'BACA'} {'TX'} 3 {'CABD'} {'WV'} 4 {'DABC'}
  2 个评论
Jon
Jon 2023-7-7
Great that regexp operates on whole cell array so no need to use cellfun.
On the other hand you seem to only look at the first 4 elements of the string. Your code won't strip of the numerical part of the strings if there are more than 4 leading characters as shown below for the 3rd row.
T = readtable('myData2.xlsx')
T = 4×3 table
Place Day Type ______ ___ ____________ {'NC'} 1 {'ABCD123' } {'NY'} 2 {'BACA654' } {'TX'} 3 {'CABDE154'} {'WV'} 4 {'DABC987' }
T.Type = regexp(T.Type,'^[A-Z]{4}','match','once')
T = 4×3 table
Place Day Type ______ ___ ________ {'NC'} 1 {'ABCD'} {'NY'} 2 {'BACA'} {'TX'} 3 {'CABD'} {'WV'} 4 {'DABC'}
If you already know that you just want the first 4 characters, no need to use regexp, or cell fun, just use extractBetween
T = readtable('myData.xlsx')
T = 4×3 table
Place Day Type ______ ___ ___________ {'NC'} 1 {'ABCD123'} {'NY'} 2 {'BACA654'} {'TX'} 3 {'CABD154'} {'WV'} 4 {'DABC987'}
T.Type = extractBetween(T.Type,1,4)
T = 4×3 table
Place Day Type ______ ___ ________ {'NC'} 1 {'ABCD'} {'NY'} 2 {'BACA'} {'TX'} 3 {'CABD'} {'WV'} 4 {'DABC'}
Stephen23
Stephen23 2023-7-7
编辑:Stephen23 2023-7-8
"On the other hand you seem to only look at the first 4 elements of the string."
That is exactly why I asked the OP for clarification, what their specific requirements are:
So far the OP has not stated how many leading characters they want to retain, we only have their examples to go by: are they complete and representative? Are they always uppercase? I doubt it... but no one here knows.
"Your code won't strip of the numerical part of the strings if there are more than 4 leading characters as shown below for the 3rd row."
The OP states that they wish to "removing the following characters/numbers", so your assumption that the characters to remove are "numerical parts" seems to be inconsistent with what the OP states (if they only mean "numerical parts" as you wrote what do they mean by "characters"?). But again, this is why I asked for clarification. I see little point in developing and testing regular expressions until I have a reasonably clear specification.
"If you already know that you just want the first 4 characters, no need to use regexp, or cell fun, just use extractBetween"

请先登录,再进行评论。

更多回答(3 个)

sushma swaraj
sushma swaraj 2023-7-6
编辑:sushma swaraj 2023-7-6
Hi, You can use the 'startsWith' function instead of 'contains' to get the appropriate result.
Hope it helps you in modifying your code!

Jon
Jon 2023-7-6
Here's a general way to handle this. Note no need for all those if statements
% Read in the data
T = readtable('myData.xlsx'); % put the name of your data file here
% Strip off the numeric part of the data in the 'Type' column
T.Type = cellfun(@(x) x(~isstrprop(x,'digit')),T.Type,'UniformOutput',false)
% Save the data back into a new Excel file (not sure if you need to do
% this)
writetable(T,'myData_stripped.xlsx')
  3 个评论
Jon
Jon 2023-7-7
As noted in my comment to @Stephen23
If you already know that you just want the first 4 characters, no need to use regexp, or cellfun, just use extractBetween
T = readtable('myData.xlsx')
T = 4×3 table
Place Day Type ______ ___ ___________ {'NC'} 1 {'ABCD123'} {'NY'} 2 {'BACA654'} {'TX'} 3 {'CABD154'} {'WV'} 4 {'DABC987'}
T.Type = extractBetween(T.Type,1,4)
T = 4×3 table
Place Day Type ______ ___ ________ {'NC'} 1 {'ABCD'} {'NY'} 2 {'BACA'} {'TX'} 3 {'CABD'} {'WV'} 4 {'DABC'}
Jon
Jon 2023-7-10
Did any of our responses answer your question? If so please accept an answer so that others will know an answer is available. If not please let us know what aspect of your problem we are missing.

请先登录,再进行评论。


Peter Perkins
Peter Perkins 2023-7-17
This just SCREAMS for using categorical. Screams. This might seem like more work, but untimately you will be happier.
TextData = ["aa";"bb";"cc";"aa";"bb";"cc";"aa";"bb";"cc";"aa"] + randi([100 200],10,1);
t = table(rand(10,1),TextData)
t = 10×2 table
Var1 TextData ________ ________ 0.22574 "aa165" 0.32424 "bb170" 0.064198 "cc157" 0.39656 "aa124" 0.49629 "bb123" 0.55708 "cc191" 0.16259 "aa152" 0.36202 "bb116" 0.7459 "cc108" 0.68575 "aa125"
t.CatData = categorical(t.TextData)
t = 10×3 table
Var1 TextData CatData ________ ________ _______ 0.22574 "aa165" aa165 0.32424 "bb170" bb170 0.064198 "cc157" cc157 0.39656 "aa124" aa124 0.49629 "bb123" bb123 0.55708 "cc191" cc191 0.16259 "aa152" aa152 0.36202 "bb116" bb116 0.7459 "cc108" cc108 0.68575 "aa125" aa125
oldCats = string(categories(t.CatData));
newCats = unique(extractBetween(oldCats,1,2))
newCats = 3×1 string array
"aa" "bb" "cc"
for i = 1:length(newCats)
toBeMerged = startsWith(oldCats,newCats(i));
t.CatData = mergecats(t.CatData,oldCats(toBeMerged),newCats(i));
end
t.TextData = []
t = 10×2 table
Var1 CatData ________ _______ 0.22574 aa 0.32424 bb 0.064198 cc 0.39656 aa 0.49629 bb 0.55708 cc 0.16259 aa 0.36202 bb 0.7459 cc 0.68575 aa

类别

Help CenterFile Exchange 中查找有关 Characters and Strings 的更多信息

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by