bertTokenizer
Description
A Bidirectional Encoder Representations from Transformers (BERT) neural network WordPiece tokenizer maps text data to sequences of integers.
Creation
Description
creates a tokenizer
= bertTokenizer(vocabulary
)bertTokenizer
object for the specified vocabulary.
sets additional properties using one or more name-value arguments.tokenizer
= bertTokenizer(vocabulary
,Name=Value
)
Input Arguments
Properties
Object Functions
encode | Tokenize and encode text for transformer neural network |
decode | Convert token codes to tokens |
encodeTokens | Convert tokens to token codes |
subwordTokenize | Tokenize text into subwords using BERT tokenizer |
wordTokenize | Tokenize text into words using tokenizer |
Examples
Algorithms
References
[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.
[2] Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun et al. "Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation." Preprint, submitted October 8, 2016. https://doi.org/10.48550/arXiv.1609.08144
Version History
Introduced in R2023b
See Also
bpeTokenizer
| bert
| bertDocumentClassifier
| encode
| decode
| encodeTokens
| subwordTokenize
| wordTokenize
Topics
- Train BERT Document Classifier
- Classify Text Data Using Deep Learning
- Create Simple Text Model for Classification
- Analyze Text Data Using Topic Models
- Analyze Text Data Using Multiword Phrases
- Sequence Classification Using Deep Learning (Deep Learning Toolbox)
- Deep Learning in MATLAB (Deep Learning Toolbox)