How to cluster similar strings?

9 次查看(过去 30 天)
Serbring
Serbring 2020-1-26
评论: Serbring 2020-1-29
Hi all,
I have long lists of strings which I have automatically collected with a brute web scraping routine. However, many strings are pretty similar and I would like to reduce the length of the list by showing only the really different names. Is there any way, cluster together the strings? Below, you will find a sample of the list.
Thank you so much.
Best regards.
{'microbiologia agraria' }
{'microbiologia forestale e ambientale' }
{'microbiologia generale' }
{'microbiologia agraria' }
{'microbiologia generale e ambientale' }
{'microbiologia del suolo e del sottosuolo' }
{'nutrition and health: the functional foods'}
{'microbiologia generale e ambientale' }
{'microbial biotechnologies in agroforestry' }
{'microbiologia generale ed ambientale' }
{'microbiologia agraria e forestale' }

回答(1 个)

Image Analyst
Image Analyst 2020-1-26
  1 个评论
Serbring
Serbring 2020-1-29
Thanks for your reply. I already knew those distances, but the real problem is how to deal with those number. I will try to be more specific, so that you will understand the basic idea of the algorithm I have developed.
Let's assume, I have three strings A, B and C. I computed the pair-wise distance between the strings (so:A - B, A-C, B-C), and then I summed the distance of one string with the other two (so A-B and A-C for A). Then, I don't have any idea on how to deal with those number. Any suggestion is appreciate.
Cheers
Michele

请先登录,再进行评论。

类别

Help CenterFile Exchange 中查找有关 Logical 的更多信息

产品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by