addEntityDetails
Add entity tags to documents
Syntax
Description
Use addEntityDetails
to add entity tags to
documents.
Use addEntityDetails
to detect person names, locations,
organizations, and other named entities in text. This process is known as named
entity recognition.
The function supports English, Japanese, German, and Korean text.
detects the named entities in updatedDocuments
= addEntityDetails(documents
)documents
. The function adds details to
the tokens with missing entity details only. To get the entity details from
updatedDocuments
, use tokenDetails
.
also specifies additional options using one or more name-value pairs.updatedDocuments
= addEntityDetails(documents
,Name,Value
)
Tip
Use addEntityDetails
before using the lower
,
upper
, normalizeWords
,
removeWords
, and removeStopWords
functions
as addEntityDetails
uses information that is removed by these
functions.
Examples
Add Named Entity Tags to Documents
Create a tokenized document array.
str = [ "Mary moved to Natick, Massachusetts." "John uses MATLAB at MathWorks."]; documents = tokenizedDocument(str);
Add the entity details to the documents using the addEntityDetails
function. This function detects the named entities in the text and adds the details to the table returned by the tokenDetails
function. View the updated token details of the first few tokens.
documents = addEntityDetails(documents); tdetails = tokenDetails(documents)
tdetails=13×8 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech Entity
_______________ ______________ ______________ __________ ___________ ________ ____________ ____________
"Mary" 1 1 1 letters en proper-noun person
"moved" 1 1 1 letters en verb non-entity
"to" 1 1 1 letters en adposition non-entity
"Natick" 1 1 1 letters en proper-noun location
"," 1 1 1 punctuation en punctuation non-entity
"Massachusetts" 1 1 1 letters en proper-noun location
"." 1 1 1 punctuation en punctuation non-entity
"John" 2 1 1 letters en proper-noun person
"uses" 2 1 1 letters en verb non-entity
"MATLAB" 2 1 1 letters en proper-noun other
"at" 2 1 1 letters en adposition non-entity
"MathWorks" 2 1 1 letters en proper-noun organization
"." 2 1 1 punctuation en punctuation non-entity
View the words tagged with the entities "person"
, "location"
, "organization"
, or "other"
. These words are the words not tagged with "non-entity"
.
idx = tdetails.Entity ~= "non-entity";
tdetails.Token(idx)
ans = 6x1 string
"Mary"
"Natick"
"Massachusetts"
"John"
"MATLAB"
"MathWorks"
Add Named Entity Tags to Japanese Text
Tokenize Japanese text using tokenizedDocument
.
str = [ "マリーさんはボストンからニューヨークに引っ越しました。" "駅へ鈴木さんを迎えに行きます。" "東京は大阪より大きいですか?" "東京に行った時、新宿や渋谷などいろいろな所を訪れました。"]; documents = tokenizedDocument(str);
For Japanese text, the software automatically adds named entity tags, so you do not need to use the addEntityDetails
function. This software detects person names, locations, organizations, and other named entities. To view the entity details, use the tokenDetails
function.
tdetails = tokenDetails(documents); head(tdetails)
Token DocumentNumber LineNumber Type Language PartOfSpeech Lemma Entity ____________ ______________ __________ _______ ________ ____________ ____________ __________ "マリー" 1 1 letters ja proper-noun "マリー" person "さん" 1 1 letters ja noun "さん" person "は" 1 1 letters ja adposition "は" non-entity "ボストン" 1 1 letters ja proper-noun "ボストン" location "から" 1 1 letters ja adposition "から" non-entity "ニューヨーク" 1 1 letters ja proper-noun "ニューヨーク" location "に" 1 1 letters ja adposition "に" non-entity "引っ越し" 1 1 letters ja verb "引っ越す" non-entity
View the words tagged with entity "person"
, "location"
, "organization"
, or "other"
. These words are the words not tagged "non-entity"
.
idx = tdetails.Entity ~= "non-entity";
tdetails(idx,:).Token
ans = 11x1 string
"マリー"
"さん"
"ボストン"
"ニューヨーク"
"鈴木"
"さん"
"東京"
"大阪"
"東京"
"新宿"
"渋谷"
Add Named Entity Tags to German Text
Tokenize German text using tokenizedDocument
.
str = [ "Ernst zog von Frankfurt nach Berlin." "Besuchen Sie Volkswagen in Wolfsburg."]; documents = tokenizedDocument(str);
To add entity tags to German text, use the addEntityDetails
function. This function detects person names, locations, organizations, and other named entities.
documents = addEntityDetails(documents);
To view the entity details, use the tokenDetails
function.
tdetails = tokenDetails(documents); head(tdetails)
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech Entity ___________ ______________ ______________ __________ ___________ ________ ____________ __________ "Ernst" 1 1 1 letters de proper-noun person "zog" 1 1 1 letters de verb non-entity "von" 1 1 1 letters de adposition non-entity "Frankfurt" 1 1 1 letters de proper-noun location "nach" 1 1 1 letters de adposition non-entity "Berlin" 1 1 1 letters de proper-noun location "." 1 1 1 punctuation de punctuation non-entity "Besuchen" 2 1 1 letters de verb non-entity
View the words tagged with entity "person"
, "location"
, "organization"
, or "other"
. These words are the words not tagged with "non-entity"
.
idx = tdetails.Entity ~= "non-entity";
tdetails(idx,:)
ans=5×8 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech Entity
____________ ______________ ______________ __________ _______ ________ ____________ ____________
"Ernst" 1 1 1 letters de proper-noun person
"Frankfurt" 1 1 1 letters de proper-noun location
"Berlin" 1 1 1 letters de proper-noun location
"Volkswagen" 2 1 1 letters de noun organization
"Wolfsburg" 2 1 1 letters de proper-noun location
Input Arguments
documents
— Input documents
tokenizedDocument
array
Input documents, specified as a tokenizedDocument
array.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: DiscardKnownValues=true
specifies to discard previously
computed details and recompute them.
RetokenizeMethod
— Method to retokenize documents
"entity"
(default) | "none"
Method to retokenize documents, specified as one of the following:
"entity"
– Transform the tokens for named entity recognition. The function merges tokens from the same entity into a single token."none"
– Do not retokenize the documents.
DiscardKnownValues
— Option to discard previously computed details
false
(default) | true
Option to discard previously computed details and recompute them, specified as
true
or false
.
Data Types: logical
Model
— NER model
"auto"
(default) | hmmEntityModel
object
Since R2023a
Custom NER model, specified as one of these values:
"auto"
— Use the built-in NER model.hmmEntityModel
object — Use the specified custom NER model. To train a custom NER model, use thetrainHMMEntityModel
function. For an example, see Train Custom Named Entity Recognition Model.
Output Arguments
updatedDocuments
— Updated documents
tokenizedDocument
array
Updated documents, returned as a tokenizedDocument
array. To get the token details from
updatedDocuments
, use tokenDetails
.
Algorithms
Language Details
tokenizedDocument
objects contain details about the tokens including language
details. The language details of the input documents determine the behavior of
addEntityDetails
. The tokenizedDocument
function, by default, automatically detects the language of
the input text. To specify the language details manually, use the
Language
option of tokenizedDocument
. To view the token details, use the tokenDetails
function.
Version History
Introduced in R2019aR2023a: Specify custom NER model
To specify a custom NER model, use the Model
name-value argument. To train a custom NER model, use the trainHMMEntityModel
function. For an example, see Train Custom Named Entity Recognition Model.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)