encode
Syntax
Description
[
tokenizes and encodes the text in tokenCodes
,segments
] = encode(tokenizer
,str
)str
using the specified tokenizer and
returns the token codes and segments. This syntax automatically adds padding, start,
unknown, and separator tokens to the input.
[
tokenizes and encodes the sentence pair tokenCodes
,segments
] = encode(tokenizer
,str1,str2
)str1,str2
. This syntax
automatically adds padding, start, unknown, and separator tokens to the input.
___ = encode(___,AddSpecialTokens=
specifies whether to add padding, start, unknown, and separator tokens to the input.tf
)
Examples
Input Arguments
Output Arguments
Algorithms
References
[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.
[2] Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun et al. "Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation." Preprint, submitted October 8, 2016. https://doi.org/10.48550/arXiv.1609.08144
Version History
Introduced in R2023b
See Also
bertTokenizer
| bpeTokenizer
| bert
| bertDocumentClassifier
| decode
| encodeTokens
| subwordTokenize
| wordTokenize
Topics
- Train BERT Document Classifier
- Classify Text Data Using Deep Learning
- Create Simple Text Model for Classification
- Analyze Text Data Using Topic Models
- Analyze Text Data Using Multiword Phrases
- Sequence Classification Using Deep Learning (Deep Learning Toolbox)
- Deep Learning in MATLAB (Deep Learning Toolbox)