Main Content

bertDocumentClassifier

BERT document classifier

Since R2023b

    Description

    A Bidirectional Encoder Representations from Transformer (BERT) model is a transformer neural network that can be fine-tuned for natural language processing tasks such as document classification and sentiment analysis. The network uses attention layers to analyze text in context and capture long-range dependencies between words.

    Creation

    Description

    mdl = bertDocumentClassifier creates a bertDocumentClassifier object.

    example

    mdl = bertDocumentClassifier(net,tokenizer) creates a bertDocumentClassifier object from the specified BERT neural network and tokenizer.

    mdl = bertDocumentClassifier(___,Name=Value) sets the ClassNames property and additional options using one or more name-value arguments.

    example

    Input Arguments

    expand all

    BERT neural network, specified as a dlnetwork (Deep Learning Toolbox) object.

    If you specify the net argument, then you must not specify the Model argument. The network must have three sequence input layers with input sizes of one. The output size of the network must match the number of classes in the ClassNames property. The inputs in net.InputNames(1), net.InputNames(2), and net.InputNames(3) must be the inputs for the input data, the attention mask, and the segments, respectively.

    BERT tokenizer, specified as a bertTokenizer object.

    If you specify the tokenizer argument, then you must not specify the Model argument.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: bertDocumentClassifier(Model="tiny") creates a BERT-Tiny document classifier.

    BERT model, specified as one of these options:

    • "base" — BERT-Base model. This option requires the Text Analytics Toolbox™ Model for BERT-Base Network support package. This model has 108.8 million learnable parameters.

    • "tiny" — BERT-Tiny model. This option requires the Text Analytics Toolbox Model for BERT-Tiny Network support package. This model has 4.3 million learnable parameters.

    • "mini" — BERT-Mini model. This option requires the Text Analytics Toolbox Model for BERT-Mini Network support package. This model has 11.1 million learnable parameters.

    • "small" — BERT-Small model. This option requires the Text Analytics Toolbox Model for BERT-Small Network support package. This model has 28.5 million learnable parameters.

    • "large" — BERT-Large model. This option requires the Text Analytics Toolbox Model for BERT-Large Network support package. This model has 334 million learnable parameters.

    • "multilingual" — BERT-Base multilingual model. This option requires the Text Analytics Toolbox Model for BERT-Base Multilingual Cased Network support package. This model has 177.2 million learnable parameters.

    If you specify the Model argument, then you must not specify the net and tokenizer arguments.

    Tip

    To customize the BERT neural network architecture, modify the dlnetwork (Deep Learning Toolbox) object output of the bert function and use the net and tokenizer arguments.

    Probability of dropping out input elements in dropout layers, specified as a scalar in the range [0, 1).

    When you train a neural network with dropout layers, the layer randomly sets input elements to zero using the dropout mask rand(size(X)) < p, where X is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

    This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Probability of dropping out input elements in attention layers, specified as a scalar in the range [0, 1).

    When you train a neural network with attention layers, the layer randomly sets attention scores to zero using the dropout mask rand(size(scores)) < p, where scores is the layer input and p is the layer dropout probability. The layer then scales the remaining elements by 1/(1-p).

    This operation helps to prevent the network from overfitting [2], [3]. A higher number results in the network dropping more elements during training. At prediction time, the output of the layer is equal to its input.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Properties

    expand all

    This property is read-only.

    Pretrained BERT model, specified as a dlnetwork (Deep Learning Toolbox) object corresponding to the net or Model argument.

    This property is read-only.

    BERT tokenizer, specified as a bertTokenizer object corresponding to the tokenizer or Model argument.

    Class names, specified as a categorical vector, a string array, or a cell array of character vectors.

    If you specify the net argument, then the output size of the network must match the number of classes.

    To set this property, use the corresponding name-value argument when you create the bertDocumentClassifier object. After you create a bertDocumentClassifier object, this property is read-only.

    Data Types: string | cell | categorical

    Object Functions

    classifyClassify document using BERT document classifier

    Examples

    collapse all

    Create a BERT document classifier that is ready for training.

    mdl = bertDocumentClassifier
    mdl = 
      bertDocumentClassifier with properties:
    
           Network: [1x1 dlnetwork]
         Tokenizer: [1x1 bertTokenizer]
        ClassNames: ["positive"    "negative"]
    
    

    View the class names.

    mdl.ClassNames
    ans = 1x2 string
        "positive"    "negative"
    
    

    Create a BERT document classifier for the classes "Electrical Failure", "Leak", "Mechanical Failure", and "Software Failure".

    classNames = ["Electrical Failure" "Leak" "Mechanical Failure" "Software Failure"];
    mdl = bertDocumentClassifier(ClassNames=classNames)
    mdl = 
      bertDocumentClassifier with properties:
    
           Network: [1x1 dlnetwork]
         Tokenizer: [1x1 bertTokenizer]
        ClassNames: ["Electrical Failure"    "Leak"    "Mechanical Failure"    "Software Failure"]
    
    

    View the class names.

    mdl.ClassNames
    ans = 1x4 string
        "Electrical Failure"    "Leak"    "Mechanical Failure"    "Software Failure"
    
    

    References

    [1] Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding" Preprint, submitted May 24, 2019. https://doi.org/10.48550/arXiv.1810.04805.

    [2] Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting." The Journal of Machine Learning Research 15, no. 1 (January 1, 2014): 1929–58

    [3] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks." Communications of the ACM 60, no. 6 (May 24, 2017): 84–90. https://doi.org/10.1145/3065386.

    Version History

    Introduced in R2023b