matlab_word2vec_bin​ary_reader

版本 1.2 (6.2 KB) 作者: Toru Ikegami
readW2Vbin - MATLAB utility to read binary word2vec embedding model file
55.0 次下载
更新时间 2020/6/25

Use `readW2Vbin` to read a pre-trained word2vec word embedding model in the binary format. It assumes that the file is written in the following format.

- The data before the first `0x20` (space) are ascii characters representing the number of vocabularies of the model , while the data between the first `0x20` and the first `0x10` (newline) represent the dimension of the word vector. (e.g.,`[ 51 48 48 48 48 48 48 32 51 48 48 10] ` means 3 milion words embedded into 300 dimensions. )
- The main body, which consists of sequence of word-vector pairs, begins right after the newline character. One word-vector pair consists of a sequence of bytes that represents a word, space (0x20), and a sequence of binary data that represents the embedded vector corresponding to the word in single precision (32bit) format. The length of the vector data is 4bytes times number of dimensions (e.g., 1200 bytes for 300 dimension).

This function was tested with the "GoogleNews-vectors-negative300.bin" from the word2vec web (https://code.google.com/archive/p/word2vec/). It took about a minute to read the 3.5GB file.

引用格式

Toru Ikegami (2024). matlab_word2vec_binary_reader (https://github.com/mathworks/matlab_word2vec_binary_reader/releases/tag/v1.2), GitHub. 检索时间: .

MATLAB 版本兼容性
创建方式 R2019b
兼容 R2019b 到 R2020a 的版本
平台兼容性
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
版本 已发布 发行说明
1.2

See release notes for this release on GitHub: https://github.com/mathworks/matlab_word2vec_binary_reader/releases/tag/v1.2

1.1

See release notes for this release on GitHub: https://github.com/mathworks/matlab_word2vec_binary_reader/releases/tag/v1.1

1.0

要查看或报告此来自 GitHub 的附加功能中的问题,请访问其 GitHub 仓库
要查看或报告此来自 GitHub 的附加功能中的问题,请访问其 GitHub 仓库