embed
Description
Examples
Map Documents to Vectors
Load the pretrained document embedding all-MiniLM-L6-v2 using the documentEmbedding
function. This model requires the Text Analytics Toolbox™ Model for all-MiniLM-L6-v2 Network support package. If this support package is not installed, then the function provides a download link.
emb = documentEmbedding;
Create an array of input documents.
documents = [ "the quick brown fox jumped over the lazy dog" "the fast brown fox jumped over the lazy dog" "the lazy dog sat there and did nothing"];
Map the input documents to vectors using the embed
function.
embeddedDocuments = embed(emb,documents);
To estimate how similar the documents are, compute the pairwise cosine similarities using cosineSimilarity
.
similarities = cosineSimilarity(embeddedDocuments)
similarities = 3×3
1.0000 0.9840 0.5505
0.9840 1.0000 0.5524
0.5505 0.5524 1.0000
Input Arguments
emb
— Input document embedding
documentEmbedding
object
Input document embedding, specified as a documentEmbedding
object.
documents
— Input documents
tokenizedDocument
array | string array | cell array of character vectors
Input documents, specified as a tokenizedDocument
array, a string array of documents, or a cell array of
character vectors. If documents
is a string array, then each string
represents a document. If documents
is a cell array of character
vectors, then each character vector represents a document.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: embed(emb,documents,MiniBatchSize=64)
embeds the specified
documents using mini-batches of size 64.
MiniBatchSize
— Mini-batch size
32
(default) | positive integer
Mini-batch size to use for embedding, specified as a positive integer. Larger mini-batch sizes require more memory, but can lead to faster results.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
Acceleration
— Performance optimization
"auto"
(default) | "mex"
| "none"
Performance optimization, specified as one of these values:
"auto"
— Automatically apply a number of optimizations that are suitable for the input network and hardware resources."mex"
— Compile and execute a MEX function. This option is available only when you use a GPU. Using a GPU requires a Parallel Computing Toolbox™ license and a supported GPU device. For information about supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If Parallel Computing Toolbox or a suitable GPU is not available, then the software returns an error."none"
— Disable all acceleration.
When you use the "auto"
or "mex"
option, the software
can offer performance benefits at the expense of an increased initial run time. Subsequent
calls to the function are typically faster. Use performance optimization when you call the
function multiple times using different input data.
When Acceleration
is "mex"
, the software generates and
executes a MEX function based on the model and parameters you specify in the function call.
A single model can have several associated MEX functions at one time. Clearing the model
variable also clears any MEX functions associated with that model.
When Acceleration
is
"auto"
, the software does not generate a MEX function.
The "mex"
option is available only when you use a GPU. You must have a
C/C++ compiler installed and the GPU Coder™ Interface for Deep Learning support package. Install the support package using the Add-On Explorer in
MATLAB®. For setup instructions, see MEX Setup (GPU Coder). GPU Coder is not required.
MATLAB
Compiler™ software does not support compiling models when you use the
"mex"
option.
ExecutionEnvironment
— Hardware resource
"auto"
(default) | "gpu"
| "cpu"
Hardware resource, specified as one of these values:
"auto"
— Use a GPU if one is available. Otherwise, use the CPU."gpu"
— Use the GPU. Using a GPU requires a Parallel Computing Toolbox license and a supported GPU device. For information about supported devices, see GPU Computing Requirements (Parallel Computing Toolbox). If Parallel Computing Toolbox or a suitable GPU is not available, then the software returns an error."cpu"
— Use the CPU.
Output Arguments
M
— Document embedding vectors
matrix
Document embedding vectors, returned as an
N1
-by-N2
matrix, where M(i,:)
is the embedding vector for the i
th document in
documents
.
Version History
Introduced in R2024a
See Also
documentEmbedding
| wordEmbedding
| word2vec
| cosineSimilarity
| tokenizedDocument
| doc2sequence
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)