Renaming categories with accents
4 次查看(过去 30 天)
显示 更早的评论
I have a categorical array t and some categories can have diacritic/accents, such as circumflexes. I want to standardize everything with no diacritic/accents.
I tried this code:
str = {'Á', 'É', 'Í', 'Ó', 'Ú','Ã','Ç','Â','Ê','Ô'};
strreplace = {'A', 'E', 'I', 'O', 'U','A','C','A','E','O'};
t = categorical({'VÉRDE','VERDE','AZUL','AMARELO','VERMELHO','VERMÊLHO'})';
cat = categories(t);
newcat = cat;
for i = 1:numel(str)
newcat = regexprep(newcat, str{i}, strreplace{i});
end
B = renamecats(t,cat,newcat)
However, after removing the accents, some categories turn out to be the same, for exemple: VERMELHO AND VERMÊLHO.
So I receive the following error:
Error using categorical/renamecats (line 39)
NEWNAMES contains duplicated values.
Is there anyway around?
This is just an example. I need a very efficient code since my categorical array t is comming from a very long table with approximaly 500 categories.
Thanks,
2 个评论
Jan
2018-11-30
500 does not sound like big data.
You did not mention what you want to happen, if the names of the categoricals are equal. So it is hard to suggest a solution.
By the way, strrep is much faster than regexprep .
采纳的回答
Guillaume
2018-11-30
str = {'Á', 'É', 'Í', 'Ó', 'Ú','Ã','Ç','Â','Ê','Ô'};
strreplace = {'A', 'E', 'I', 'O', 'U','A','C','A','E','O'};
t = categorical({'VÉRDE','VERDE','AZUL','AMARELO','VERMELHO','VERMÊLHO'})';
cat = categories(t);
%calculation of new categories, no need for loop
newcat = replace(cat, str, strreplace);
%replace cat by newcat. Create new categorical array using newcat and the index of the original categories in t:
newt = categorical(newcat(double(t)))
0 个评论
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!