Main Content

predict

Predict entities using named entity recognition (NER) model

Since R2023a

    Description

    The predict function detects named entities in text using a hmmEntityModel object.

    To add entity details to documents using a custom NER model, use addDependencyDetails and set the Model option to the custom model.

    example

    tbl = predict(mdl,documents) predicts the named entities of the tokens in the specified documents using the NER model mdl.

    Examples

    collapse all

    Load the trained example hmmEntityModel object.

    load exampleEntityModel
    mdl
    mdl = 
      hmmEntityModel with properties:
    
        Entities: [3x1 categorical]
    
    

    Create a tokenized document object of text data.

    str = "MathWorks develops MATLAB and Simulink.";
    document = tokenizedDocument(str);

    Make predictions using the predict function.

    tbl = predict(mdl,document)
    tbl=6×2 table
           Token           Entity    
        ___________    ______________
    
        "MathWorks"    B-organization
        "develops"     non-entity    
        "MATLAB"       B-product     
        "and"          non-entity    
        "Simulink"     B-product     
        "."            non-entity    
    
    

    Input Arguments

    collapse all

    Custom NER model, specified as a hmmEntityModel object. To train a custom NER model, use the trainHMMEntityModel function.

    For an example, see Train Custom Named Entity Recognition Model.

    Input documents, specified as a tokenizedDocument array.

    Output Arguments

    collapse all

    Predicted entities, returned as a table with these variables:

    Algorithms

    collapse all

    Inside, Outside, Beginning (IOB) Labeling Schemes

    The inside, outside (IO) labeling scheme tags entities with "O" or prefixes the entities with "I". The tag "O" (outside) denotes nonentities. For each token in an entity, the tag is prefixed with "I-" (inside), which signifies that the token is part of an entity.

    The IO labeling scheme does not specify entity boundaries between adjacent entities of the same type. The inside, outside, beginning (IOB) labeling scheme, also known as the beginning, inside, outside (BIO) labeling scheme, addresses this limitation by introducing a "beginning" prefix.

    The IOB labeling scheme has two variants: IOB1 and IOB2.

    IOB2 Labeling Scheme

    For each token in an entity, the tag is prefixed with one of these values:

    • "B-" (beginning) — The token is a single-token entity or the first token of a multitoken entity.

    • "I-" (inside) — The token is a subsequent token of a multitoken entity.

    For a list of entity tags Entity, the IOB labeling scheme helps identify boundaries between adjacent entities of the same type by using this logic:

    • If Entity(i) has the prefix "B-" and Entity(i+1) is "O" or has the prefix "B-", then Token(i) is a single entity.

    • If Entity(i) has the prefix "B-", Entity(i+1), ..., Entity(N) have the prefix "I-", and Entity(N+1) is "O" or has the prefix "B-", then the phrase Token(i:N) is a multitoken entity.

    IOB1 Labeling Scheme

    The IOB1 labeling scheme does not use the prefix "B-" when an entity token follows an "O-" prefix. In this case, an entity token that is the first token in a list or that follows a nonentity token is the first token of an entity. That is, if Entity(i) has the prefix "I-" and i is equal to 1 or Entity(i-1) has the prefix "O-", then Token(i) is a single-token entity or the first token of a multitoken entity.

    Alternative Functionality

    To add entity details to documents using a custom NER model, use addDependencyDetails and set the Model option to the custom model.

    Version History

    Introduced in R2023a