Creating the matrix of GloVe embedded vocabulary
1 次查看(过去 30 天)
显示 更早的评论
Per the documentation, the file contains 400k vocabulary words, each of which is represented as a 300d vector.
I want, then, to create a matrix in Matlab, 400k X 300 that lists all the 400k embedded vectors of the vocabulary. I do not need to save the text-word equivalent of each vector.
What might be the simplest Matlab code to create such matrix from glove.6B.zip ?
Thanks for your anticipated help!
0 个评论
采纳的回答
Shantanu Dixit
2025-4-30
编辑:Shantanu Dixit
2025-4-30
Hi Amos,
You can create an embedding matrix for the 'GLoVE' embeddings by initializing a matrix of size 400K × 300 initialized with 'zeros': https://www.mathworks.com/help/matlab/ref/zeros.html Corresponsingly each line can be read and stored (only the numeric part) in the matrix, discarding the word. As the file is in the text format, for storing the word vectors 'str2double':https://www.mathworks.com/help/matlab/ref/str2double.html can be used to convert the text to numbers. Each line in the file looks like this:
the 0.04656 0.21318 -0.0074364 -0.45854 ...
Overall after reading each line the corresponding vector can be stored as follows:
fid = fopen('glove.6B.300d.txt', 'r');
embeddingMatrix = zeros(400000, 300);
for i = 1:400000
line = fgetl(fid);
tokens = strsplit(line);
embeddingMatrix(i, :) = str2double(tokens(2:end));
end
fclose(fid);
You can also refer to following other useful documentation pages by MathWorks:
Hope this helps!
更多回答(0 个)
另请参阅
类别
在 Help Center 和 File Exchange 中查找有关 Introduction to Installation and Licensing 的更多信息
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!