Main Content

documentEmbedding

Document embedding model to map documents to vectors

Since R2024a

    Description

    A document embedding maps documents to real vectors.

    The vectors attempt to capture the semantic content of the full document, so similar documents have similar vectors. The document can be a sentence, a paragraph, or a longer text.

    Creation

    Create a document embedding from a pretrained embedding using documentEmbedding.

    Description

    emb = documentEmbedding returns a document embedding using the all-MiniLM-L6-v2 sentence transformers model.

    This function requires Deep Learning Toolbox™.

    example

    emb = documentEmbedding(Model=modelName) returns the document embedding model specified by the Model name-value argument.

    Input Arguments

    expand all

    Model name, specified as one of these values:

    • "all-MiniLM-L6-v2"— Sentence transformer model with six self-attention layers. This model outputs a 1-by-384 embedding vector. This option requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package.

    • "all-MiniLM-L12-v2"— Sentence transformer model with twelve self-attention layers. This model outputs a 1-by-384 embedding vector. This option requires the Text Analytics Toolbox Model for all-MiniLM-L12-v2 Network support package.

    If the required support package is not installed, then the function provides a download link.

    Object Functions

    embedMap document to embedding vector

    Examples

    collapse all

    Load the pretrained document embedding all-MiniLM-L6-v2 using the documentEmbedding function. This model requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package. If this support package is not installed, then the function provides a download link.

    emb = documentEmbedding;

    Create an array of input documents.

    documents = [
        "the quick brown fox jumped over the lazy dog"
        "the fast brown fox jumped over the lazy dog"
        "the lazy dog sat there and did nothing"];

    Map the input documents to vectors using the embed function.

    embeddedDocuments = embed(emb,documents);

    To estimate how similar the documents are, compute the pairwise cosine similarities using cosineSimilarity.

    similarities = cosineSimilarity(embeddedDocuments)
    similarities = 3×3
    
        1.0000    0.9840    0.5505
        0.9840    1.0000    0.5524
        0.5505    0.5524    1.0000
    
    

    References

    [1] Reimers, Nils, and Iryna Gurevych. "Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks" Preprint, submitted August 27, 2019. https://doi.org/10.48550/arXiv.1908.10084.

    Version History

    Introduced in R2024a