Main Content

readWordEmbedding

Read word embedding from file

Description

emb = readWordEmbedding(filename) reads the pretrained word embedding stored in text file or zip file filename. The input file must be a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format.

If the word embedding file contains duplicate words, then the software uses the word vector corresponding to the last duplicate entry.

example

Examples

collapse all

Read the example word embedding. This model was derived by analyzing text from Wikipedia.

filename = "exampleWordEmbedding.vec";
emb = readWordEmbedding(filename)
emb = 
  wordEmbedding with properties:

     Dimension: 50
    Vocabulary: ["utc"    "first"    "new"    "two"    "time"    "up"    "school"    "article"    "world"    "years"    "university"    "talk"    "many"    "national"    "later"    "state"    "made"    "born"    "city"    "de"    ...    ] (1x9999 string)

Explore the word embedding using word2vec and vec2word.

king = word2vec(emb,"king");
man = word2vec(emb,"man");
woman = word2vec(emb,"woman");
word = vec2word(emb,king - man + woman)
word = 
"queen"

Input Arguments

collapse all

Name of the file, specified as a string scalar, character vector, or a 1-by-1 cell array containing a character vector.

Data Types: string | char | cell

Output Arguments

collapse all

Output word embedding, returned as a wordEmbedding object.

Version History

Introduced in R2017b