- Use context around the OOV word. You can use the word embedding of the previous and next word to your current OOV word.
- Use synonyms or similar word to get the word embedding for your OOV word.
Handling out-of-vocabulary word in word embedding
2 次查看(过去 30 天)
显示 更早的评论
I'm using FastText and my own word embedding on a set of documents. It is being used to detect abbreviations (Y/N) for each word token.
When testing, words that does not have vectors (out-of-vocabulary - OOV words), and discarded and not included in the performance measures (precision, recall, etc.) giving a false result. How do you handle this?
Would you replace all words with NaN values be included in the performance measure? Can the NaN values be replaced with a vector? How would you decide which vector?
0 个评论
回答(1 个)
Prince Kumar
2021-8-16
From my understanding your want to handle OOV(out-of-vocabulary) words for your abbreviations detection task. For now MATLAB fastTextWordEmbedding does not handle OOV words.
There are many ways to do it, following are the two popular ones:
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Characters and Strings 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!