How was the exampleWordEmbedding example in the text analytics toolbox trained, in detail?

Question

William Smith 2017-11-19

0
链接

此问题的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/368006-how-was-the-examplewordembedding-example-in-the-text-analytics-toolbox-trained-in-detail

回答： Christopher Creutzig 2020-3-9

The documentation for readWordEmbedding gives a pre-trained embedding, saying only that it was "derived by analyzing text from Wikipedia".

How was it trained?

Should we consider it a 'high quality' word embedding, better than anything a user could generate without extensive work and CPU time? Or is it a quick and dirty starting point, and we are encouraged to train our own for better performance?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

请先登录，再回答此问题。

Answer 1

Christopher Creutzig 2020-3-9

0
链接

此回答的直接链接

https://ww2.mathworks.cn/matlabcentral/answers/368006-how-was-the-examplewordembedding-example-in-the-text-analytics-toolbox-trained-in-detail#answer_419231

The embedding is rather low-dimensional (50 dimensions) and has a small vocabulary (with 9999 words). It is unlikely to be “high quality” unless your analysis just happens to need precisely this dataset.

For production use, it is much more likely you'll find fastTextWordEmbedding useful, which downloads data from https://www.mathworks.com/matlabcentral/fileexchange/66229-text-analytics-toolbox-model-for-fasttext-english-16-billion-token-word-embedding for you.

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

请先登录，再进行评论。

How was the exampleWordEmbedding example in the text analytics toolbox trained, in detail?

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

How was the exampleWordEmbedding example in the text analytics toolbox trained, in detail?

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

回答（1 个）

0 个评论 显示 -2更早的评论隐藏 -2更早的评论

另请参阅

类别

标签

产品

Community Treasure Hunt

0 个评论
显示 -2更早的评论隐藏 -2更早的评论

0 个评论
显示 -2更早的评论隐藏 -2更早的评论