Efficient way to standardize large amounts of text

Question

André Kucharzewski 2021-10-19

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1567448-efficient-way-to-standardize-large-amounts-of-text

评论： André Kucharzewski 2021-10-24

采纳的回答： Duncan Po

Hello,

i have a table with a size of around 1 million rows. In one column there are different type of strings.

Mixed with letters and numbers. Like:

abc_123

cdf_123

123_cdf

123 (abc)

There are around 120 different text formats which repeat. Most of them are able to bring in a standard format like aa_11. Any format which is not able to fit get a standard undef format.

Any suggestions how i can handel such a large dataset without for loop over 1Million rows and check each cell?

Thanks in advance :)

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Duncan Po 2021-10-19

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/1567448-efficient-way-to-standardize-large-amounts-of-text#answer_812253

You may be able to use patterns. For example, suppose the standard format is letters followed by underscore followed by numbers, you can detect this pattern:

>> x = ["abc_123", "cdf_123", "123_cdf", "123 (abc)"]; % create an example string array

>> matches(x, lettersPattern + "_" + digitsPattern) % check if the strings match the standard pattern

ans =

1×4 logical array

1 1 0 0

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

André Kucharzewski 2021-10-24

That should do the work, but its an function introduced with R2019b I only have R2019a.

Kinda sad :(

But Thank you for ur input :)

请先登录，再进行评论。

Efficient way to standardize large amounts of text

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

Efficient way to standardize large amounts of text

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

1 个评论 显示 -1更早的评论隐藏 -1更早的评论

更多回答（0 个）

另请参阅

类别

标签

产品

版本

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

1 个评论
显示 -1更早的评论隐藏 -1更早的评论