Categorical Data preprocessing for Data mining

Samuel Katongole

2021 10 6

0 个回答

7 次查看（30 天）

显示更早的评论

0 个投票

Hello friends

I have been working on the Tanzania wells state ,with Taarifa data obtained from DrivenData, problem for my ML practice; and I am now trying to remove misspellings in the installer and funder columns. Anyone who's tried this to please help me on how to go about it. And if there be a faster way, that would be very helpful.

Oh, thanks

I am trying to clean out misspellings from the installer and funder columns. For the moment I am using regular expressions; though the data is too much, and seems to be taking longer.

For instance, when trying to correct those for world bank I tried this expression which is still failing

pat11='wo(rd|rdl|uld|rld)?\s((b\w*|nk|divisio)$)?[^vd]';
newDataClean.installer=regexprep(newDataClean.installer,pat11,'world bank');

Here i was testing the expression in Atom, but it fails to correctly replace those selected words

However, I am still wondering if there could be another "faster" way of approaching the issue!

1 个评论
显示 -1更早的评论隐藏 -1更早的评论

KSSV 2021-10-6

Question is not clear. Can you elaborate with an example?

请先登录，再进行评论。

请先登录，再回答此问题。

请先登录再关注

回答（0 个）

请先登录，再回答此问题。

类别

在帮助中心和 File Exchange 中查找有关 Logical 的更多信息

产品

版本

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Categorical Data preprocessing for Data mining

1 个评论 显示 -1更早的评论 隐藏 -1更早的评论

回答（0 个）

类别

产品

版本

标签

另请参阅

Community Treasure Hunt

1 个评论
显示 -1更早的评论隐藏 -1更早的评论