The IOB format (short for inside, outside, beginning), also commonly referred to as the BIO format, is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition). It was presented by Ramshaw and Marcus in their paper "Text Chunking using Transformation-Based Learning", 1995 The I- prefix before a tag indicates that the tag is inside a chunk. An O tag indicates that a token belongs to no chunk. The B- prefix before a tag indicates that the tag is the beginning of a chunk that immediately follows another chunk without O tags between them. It is used only in that case: when a chunk comes after an O tag, the first token of the chunk takes the I- prefix.
Attributes | Values |
---|
rdfs:label
| - Inside–outside–beginning (tagging) (en)
|
rdfs:comment
| - The IOB format (short for inside, outside, beginning), also commonly referred to as the BIO format, is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition). It was presented by Ramshaw and Marcus in their paper "Text Chunking using Transformation-Based Learning", 1995 The I- prefix before a tag indicates that the tag is inside a chunk. An O tag indicates that a token belongs to no chunk. The B- prefix before a tag indicates that the tag is the beginning of a chunk that immediately follows another chunk without O tags between them. It is used only in that case: when a chunk comes after an O tag, the first token of the chunk takes the I- prefix. (en)
|
dcterms:subject
| |
Wikipage page ID
| |
Wikipage revision ID
| |
Link from a Wikipage to another Wikipage
| |
sameAs
| |
dbp:wikiPageUsesTemplate
| |
has abstract
| - The IOB format (short for inside, outside, beginning), also commonly referred to as the BIO format, is a common tagging format for tagging tokens in a chunking task in computational linguistics (ex. named-entity recognition). It was presented by Ramshaw and Marcus in their paper "Text Chunking using Transformation-Based Learning", 1995 The I- prefix before a tag indicates that the tag is inside a chunk. An O tag indicates that a token belongs to no chunk. The B- prefix before a tag indicates that the tag is the beginning of a chunk that immediately follows another chunk without O tags between them. It is used only in that case: when a chunk comes after an O tag, the first token of the chunk takes the I- prefix. Another similar format which is widely used is IOB2 format, which is the same as the IOB format except that the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag). A readable introduction to entity tagging is given in Bob Carpenter's blog post, "Coding Chunkers as Taggers". An example with IOB format: Alex I-PERis Ogoing Oto OLos I-LOCAngeles I-LOCin OCalifornia I-LOC Notice how "Alex", "Los" and "California", although first tokens of their chunk, have the "I-" prefix. The same example after filtering out stop words: Alex I-PERgoing OLos I-LOCAngeles I-LOCCalifornia B-LOC Notice how "California" now has the "B-" prefix, because it immediately follows another LOC chunk. The same example with IOB2 format (with tagging unaffected by stop word filtering): Alex B-PERis Ogoing Oto OLos B-LOCAngeles I-LOCin OCalifornia B-LOC Related tagging schemes sometimes include "START/END: This consists of the tags B, E, I, S or O where S is used to represent a chunk containing a single token. Chunks of length greater than or equal to two always start with the B tag and end with the E tag." Other Tagging Scheme's include BIOES/BILOU, where 'E' and 'L' denotes Last or Ending character is such a sequence and 'S' denotes Single element or 'U' Unit element. An Example with BIOES format: Alex S-PERis Ogoing Owith OMarty B-PERA. I-PERRick E-PERto OLos B-LOCAngeles E-LOC (en)
|
prov:wasDerivedFrom
| |
page length (characters) of wiki page
| |
foaf:isPrimaryTopicOf
| |
is Link from a Wikipage to another Wikipage
of | |
is Wikipage redirect
of | |
is foaf:primaryTopic
of | |