Remove stop words from a cell array

Question

0 个投票

I am trying to remove german stop-words from a cell array. Below is the respective code:

if removeStopWords == 1
stop_words = cellstr(stopWords('Language','de'));
split1 = regexp(chatM, '\s','Split');
split1 = cellfun(@(x)strjoin(x), split1, 'Uni', 0);
split1 = cellfun(@(x)convertStringsToChars(x), split1, 'Uni', 0);
chatM = strjoin(split1(~ismember(split1, stop_words)), ' ');
end

I applied a solution from a simillar question on the forum, but it did not work for me properly and I do not understand why.

The problem is, that it removes patterns and not words: imagine you want to remove "Ok" from "Ok Okay OOk". The result will be "ay O", but I want to get "Okay OOk".

By the way, I also tried removeStopWords(str), but the result was the same. In the end I want to count the occurrence of "relevant" words, so I need to remove stop words at first.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Follow Question

Answer 1

Cris LaPierre 2020-7-24

0 个投票

Do you have the text analytics toolbox? You must because you are using stopWords. Try using the removeStopWords function.

6 个评论
显示 4更早的评论隐藏 4更早的评论

Sergiu Panainte 2020-7-25

Thanks, generally it solved my problem. But recently I found out that matlab supports parallel computing.

Is there a way to apply it for your sollution? That would be really great, because my original table has like 90k lines..

Cris LaPierre 2020-7-25

I'm not familiar with the requirements of parallel computing. Generally, if the process can be split into pieces without affecting the results (results of each piece are independent of the results of the other pieces), then it technically should be able to be parallelized. If you look at the product requirements for text analytics, you can see that the Parallel Computing Toolbox is recommended, suggesting that it is supported.

Do you have access to the Parallel Computing toolbox? Simplest would be to test. Note that there is an initial load time as the parallel workers start up (~30 seconds?). Test with and without parallelization. You may find it quicker to process the text without it..

请先登录，再进行评论。

Remove stop words from a cell array

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

采纳的回答

6 个评论
显示 4更早的评论隐藏 4更早的评论

更多回答（0 个）

类别

标签

Community Treasure Hunt

Remove stop words from a cell array

0 个评论 显示 -2更早的评论 隐藏 -2更早的评论

采纳的回答

6 个评论 显示 4更早的评论 隐藏 4更早的评论

更多回答（0 个）

类别

标签

另请参阅

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

6 个评论
显示 4更早的评论隐藏 4更早的评论