Efficient way to standardize large amounts of text
1 次查看(过去 30 天)
显示 更早的评论
Hello,
i have a table with a size of around 1 million rows. In one column there are different type of strings.
Mixed with letters and numbers. Like:
abc_123
cdf_123
123_cdf
123 (abc)
There are around 120 different text formats which repeat. Most of them are able to bring in a standard format like aa_11. Any format which is not able to fit get a standard undef format.
Any suggestions how i can handel such a large dataset without for loop over 1Million rows and check each cell?
Thanks in advance :)
0 个评论
采纳的回答
Duncan Po
2021-10-19
You may be able to use patterns. For example, suppose the standard format is letters followed by underscore followed by numbers, you can detect this pattern:
>> x = ["abc_123", "cdf_123", "123_cdf", "123 (abc)"]; % create an example string array
>> matches(x, lettersPattern + "_" + digitsPattern) % check if the strings match the standard pattern
ans =
1×4 logical array
1 1 0 0
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!