bert

Pretrained BERT model

Since R2023b

Syntax

[net,tokenizer] = bert

[net,tokenizer] = bert(Name=Value)

Description

A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.

[net,tokenizer] = bert returns a pretrained BERT-Base model and the corresponding tokenizer.

Tip

For document classification workflows, use a bertDocumentClassifier object with the trainBERTDocumentClassifier function.

example

[net,tokenizer] = bert(Name=Value) specifies additional options using one or more name-value arguments.

Examples

collapse all

Load Pretrained BERT Neural Network

This example uses:

Open Live Script

Load a pretrained BERT-Base neural network and the corresponding tokenizer using the bert function. If the Text Analytics Toolbox™ Model for BERT-Base Network support package is not installed, then the function provides a link to the required support package in the Add-On Explorer. To install the support package, click the link, and then click Install.

[net,tokenizer] = bert;

View the network properties.

net

net = 
  dlnetwork with properties:

         Layers: [129×1 nnet.cnn.layer.Layer]
    Connections: [164×2 table]
     Learnables: [197×3 table]
          State: [0×3 table]
     InputNames: {'input_ids'  'attention_mask'  'seg_ids'}
    OutputNames: {'enc12_layernorm2'}
    Initialized: 1

  View summary with summary.

View the tokenizer.

tokenizer

tokenizer = 
  bertTokenizer with properties:

        IgnoreCase: 1
      StripAccents: 1
      PaddingToken: "[PAD]"
       PaddingCode: 1
        StartToken: "[CLS]"
         StartCode: 102
      UnknownToken: "[UNK]"
       UnknownCode: 101
    SeparatorToken: "[SEP]"
     SeparatorCode: 103
       ContextSize: 512

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: [net,tokenizer] = bert(Model="tiny") returns a pretrained BERT-Tiny model and the corresponding tokenizer.

`Model` — BERT model
`"base"` (default) | `"tiny"` | `"mini"` | `"small"` | `"large"` | `"multilingual"`

BERT model, specified as one of these options:

"base" — BERT-Base model. This option requires the Text Analytics Toolbox™ Model for BERT-Base Network support package. This model has 108.8 million learnable parameters.
"tiny" — BERT-Tiny model. This option requires the Text Analytics Toolbox Model for BERT-Tiny Network support package. This model has 4.3 million learnable parameters.
"mini" — BERT-Mini model. This option requires the Text Analytics Toolbox Model for BERT-Mini Network support package. This model has 11.1 million learnable parameters.
"small" — BERT-Small model. This option requires the Text Analytics Toolbox Model for BERT-Small Network support package. This model has 28.5 million learnable parameters.
"large" — BERT-Large model. This option requires the Text Analytics Toolbox Model for BERT-Large Network support package. This model has 334 million learnable parameters.
"multilingual" — BERT-Base multilingual model. This option requires the Text Analytics Toolbox Model for BERT-Base Multilingual Cased Network support package. This model has 177.2 million learnable parameters.

`Head` — Model head
`"none"` (default) | `"document-classifier"`

Model head, specified as one of these values:

"document-classifier" — Return a model with a document classification head. The head contains a fully connected layer with an output size of NumClasses and a softmax layer.
"none" — Return a headless model.

`NumClasses` — Number of classes for document classification head
`2` (default) | positive integer

Number of classes for the document classification head, specified as a positive integer.

This option only applies when Head is "document-classifier".

`DropoutProbability` — Probability of dropping out input elements in dropout layers
`0.1` (default) | scalar in the range [0, 1)

Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).

When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p, where X is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

`AttentionDropoutProbability` — Probability of dropping out input elements in attention layers
`0.1` (default) | scalar in the range [0, 1)

Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).

When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p, where scores is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

Output Arguments

collapse all

`net` — Pretrained BERT model
`dlnetwork` object

Pretrained BERT model, returned as a dlnetwork (Deep Learning Toolbox) object.

`tokenizer` — BERT tokenizer
`bertTokenizer` object

BERT tokenizer, returned as a bertTokenizer object.

References

[1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.

[2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58

[3] Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin et al. "Attention Is All You Need." Advances in Neural Information Processing Systems 30, 2017. https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf.

Version History

Introduced in R2023b

bert

Syntax

Description

Examples

Load Pretrained BERT Neural Network

Name-Value Arguments

`Model` — BERT model
`"base"` (default) | `"tiny"` | `"mini"` | `"small"` | `"large"` | `"multilingual"`

`Head` — Model head
`"none"` (default) | `"document-classifier"`

`NumClasses` — Number of classes for document classification head
`2` (default) | positive integer

`DropoutProbability` — Probability of dropping out input elements in dropout layers
`0.1` (default) | scalar in the range [0, 1)

`AttentionDropoutProbability` — Probability of dropping out input elements in attention layers
`0.1` (default) | scalar in the range [0, 1)

Output Arguments

`net` — Pretrained BERT model
`dlnetwork` object

`tokenizer` — BERT tokenizer
`bertTokenizer` object

References

Version History

See Also

Topics

bert

Syntax

Description

Examples

Load Pretrained BERT Neural Network

Name-Value Arguments

Model — BERT model "base" (default) | "tiny" | "mini" | "small" | "large" | "multilingual"

Head — Model head "none" (default) | "document-classifier"

NumClasses — Number of classes for document classification head 2 (default) | positive integer

DropoutProbability — Probability of dropping out input elements in dropout layers 0.1 (default) | scalar in the range [0, 1)

AttentionDropoutProbability — Probability of dropping out input elements in attention layers 0.1 (default) | scalar in the range [0, 1)

Output Arguments

net — Pretrained BERT model dlnetwork object

tokenizer — BERT tokenizer bertTokenizer object

References

Version History

See Also

Topics

`Model` — BERT model
`"base"` (default) | `"tiny"` | `"mini"` | `"small"` | `"large"` | `"multilingual"`

`Head` — Model head
`"none"` (default) | `"document-classifier"`

`NumClasses` — Number of classes for document classification head
`2` (default) | positive integer

`DropoutProbability` — Probability of dropping out input elements in dropout layers
`0.1` (default) | scalar in the range [0, 1)

`AttentionDropoutProbability` — Probability of dropping out input elements in attention layers
`0.1` (default) | scalar in the range [0, 1)

`net` — Pretrained BERT model
`dlnetwork` object

`tokenizer` — BERT tokenizer
`bertTokenizer` object