Tagging Scheme for NER

For Named Entity Recognition, there are a few tagging schemes. The most common one is IOB2 or more commonly known as BIO where B- tag is used in the begining of every chunks, I- for inner token of a multi-token entity and O for non-entity token.

IO

No boundary. Cannot distinguish between two entities of the same type that are right next to each other.

IOB - Ramshaw and Marcus (1995)

The B- tag is used only when a tag is followed by a tag of the same type without O tokens between them.

IOB2 (aka BIO)

Same as IOB format except B- tag is used in the beginning of every chunk.

IOE

The E- tag is used only when a tag is followed by a tag of the same type without O tokens between them.

IOE2

Same as IOE format except E- tag is used in the end of every chunk.

BILOU

IOBES - same as BILOU

In particular, motivated by the works on Chinese word segmentation, the tags E and S, which stand for “End of the entity” and “Single-word entity”, are added to the IOB tag set to form a four-(IOBE) and five-(IOBES) tag sets. - Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

BMEWO - Borthwick (1999)

For any particular N.E. category x from the set of n categories, we could be in one of 4 states: x_start, x_continue, x_end, and x_unique. In addition, a token could be tagged as “other” to indicate that it is not part of a named entity. - Borthwick (1999)

Useful with more powerful machine learning (e.g., max entropy)

BMEWO+ - Bob Carpenter

With the following addition on top of BMEWO,

I introduced the BMEWO+ encoding for the LingPipe HMM-based chunkers. Because of the conditional independence assumptions in HMMs, they can’t use information about preceding or following words. Adding finer-grained information to the tags themselves implicitly encodes a kind of longer-distance information. This allows a different model to generate words after person entities (e.g. John said), for example, than generates words before location entities (e.g. in Boston). The tag transition constraints (B_X must be followed by M_X or E_X, etc.) propagate decisions, allowing a strong location-preceding word to trigger a location.

Note that it also adds a begin and end of sequence subcategorization to the out tags. This helped reduce the confusion between English sentence capitalization and proper name capitalization.

Source: Coding Chunkers as Taggers: IO, BIO, BMEWO, and BMEWO+

Performance Comparison

There are a few papers that discuss the performance using different tagging scheme.

… the less used BILOU formalism significantly outperforms the widely adopted BIO tagging scheme. - Design Challenges and Misconceptions in Named Entity Recognition

Moreover, of all the tag sets used, delicate tag schema such as the five-tag scheme IOBES provided better performance than the others. - Enhancing of chemical compound and drug name recognition using representative tag scheme and fine-grained tokenization.

However, we did not observe a significant improvement over the IOB tagging scheme. - Neural Architectures for Named Entity Recognition

In our experiments, models using BIOES are significantly (p < 0.05) better than BIO. - Design Challenges and Misconceptions in Neural Sequence Labeling

References