Main Content

bert

Pretrained BERT model

Since R2023b

    Description

    A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.

    [net,tokenizer] = bert returns a pretrained BERT-Base model and the corresponding tokenizer.

    Tip

    For document classification workflows, use a bertDocumentClassifier object with the trainBERTDocumentClassifier function.

    example

    [net,tokenizer] = bert(Name=Value) specifies additional options using one or more name-value arguments.

    Examples

    collapse all

    Load a pretrained BERT-Base neural network and the corresponding tokenizer using the bert function. If the Text Analytics Toolbox™ Model for BERT-Base Network support package is not installed, then the function provides a link to the required support package in the Add-On Explorer. To install the support package, click the link, and then click Install.

    [net,tokenizer] = bert;

    View the network properties.

    net
    net = 
      dlnetwork with properties:
    
             Layers: [129x1 nnet.cnn.layer.Layer]
        Connections: [164x2 table]
         Learnables: [197x3 table]
              State: [0x3 table]
         InputNames: {'input_ids'  'attention_mask'  'seg_ids'}
        OutputNames: {'enc12_layernorm2'}
        Initialized: 1
    
      View summary with summary.
    
    

    View the tokenizer.

    tokenizer
    tokenizer = 
      bertTokenizer with properties:
    
            IgnoreCase: 1
          StripAccents: 1
          PaddingToken: "[PAD]"
           PaddingCode: 1
            StartToken: "[CLS]"
             StartCode: 102
          UnknownToken: "[UNK]"
           UnknownCode: 101
        SeparatorToken: "[SEP]"
         SeparatorCode: 103
           ContextSize: 512
    
    

    Input Arguments

    collapse all

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: [net,tokenizer] = bert(Model="tiny") returns a pretrained BERT-Tiny model and the corresponding tokenizer.

    BERT model, specified as one of these options:

    • "base" — BERT-Base model. This option requires the Text Analytics Toolbox™ Model for BERT-Base Network support package. This model has 108.8 million learnable parameters.

    • "tiny" — BERT-Tiny model. This option requires the Text Analytics Toolbox Model for BERT-Tiny Network support package. This model has 4.3 million learnable parameters.

    • "mini" — BERT-Mini model. This option requires the Text Analytics Toolbox Model for BERT-Mini Network support package. This model has 11.1 million learnable parameters.

    • "small" — BERT-Small model. This option requires the Text Analytics Toolbox Model for BERT-Small Network support package. This model has 28.5 million learnable parameters.

    • "large" — BERT-Large model. This option requires the Text Analytics Toolbox Model for BERT-Large Network support package. This model has 334 million learnable parameters.

    • "multilingual" — BERT-Base multilingual model. This option requires the Text Analytics Toolbox Model for BERT-Base Multilingual Cased Network support package. This model has 177.2 million learnable parameters.

    Model head, specified as one of these values:

    • "document-classifier" — Return a model with a document classification head. The head contains a fully connected layer with an output size of NumClasses and a softmax layer.

    • "none" — Return a headless model.

    Number of classes for the document classification head, specified as a positive integer.

    This option only applies when Head is "document-classifier".

    Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).

    When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p, where X is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

    This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).

    When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p, where scores is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

    This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Output Arguments

    collapse all

    Pretrained BERT model, returned as a dlnetwork (Deep Learning Toolbox) object.

    BERT tokenizer, returned as a bertTokenizer object.

    References

    [1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.

    [2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58

    [3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks." Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.

    Version History

    Introduced in R2023b