The inside, outside (IO) labeling scheme tags entities with
"O"
or prefixes the entities with "I"
. The tag
"O"
(outside) denotes nonentities. For each token in an entity, the
tag is prefixed with "I-"
(inside), which signifies that the token is
part of an entity.
The IO labeling scheme does not specify entity boundaries between adjacent entities of the
same type. The inside, outside, beginning (IOB) labeling scheme, also
known as the beginning, inside, outside (BIO) labeling scheme,
addresses this limitation by introducing a "beginning" prefix.
The IOB labeling scheme has two variants: IOB1 and IOB2.
IOB2 Labeling SchemeFor each token in an entity, the tag is prefixed with one of these values:
For a list of entity tags Entity
, the IOB labeling
scheme helps identify boundaries between adjacent entities of the same type by using
this logic:
If Entity(i)
has the prefix "B-"
and
Entity(i+1)
is "O"
or has the
prefix "B-"
, then Token(i)
is a single
entity.
If Entity(i)
has the prefix "B-"
,
Entity(i+1)
, ..., Entity(N)
have
the prefix "I-"
, and Entity(N+1)
is
"O"
or has the prefix "B-"
, then
the phrase Token(i:N)
is a multitoken entity.
IOB1 Labeling SchemeThe IOB1 labeling scheme does not use the prefix "B-"
when an entity token
follows an "O-"
prefix. In this case, an entity token that is the
first token in a list or that follows a nonentity token is the first token of an entity.
That is, if Entity(i)
has the prefix "I-"
and
i
is equal to 1 or Entity(i-1)
has the prefix
"O-"
, then Token(i)
is a single-token entity
or the first token of a multitoken entity.