A treebank is a text corpus in which each sentence has been annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name treebank. Treebanks can be used in corpus linguistics for studying syntactic phenomena or in computational linguistics for training or testing parsers.
| Property | Value |
| p:abstract
| - A treebank is a text corpus in which each sentence has been annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name treebank. Treebanks can be used in corpus linguistics for studying syntactic phenomena or in computational linguistics for training or testing parsers.
Treebanks are often created on top of a corpus that has already been annotated with part-of-speech tags. In turn, treebanks are sometimes enhanced with semantic or other linguistic information.
Treebanks can be created completely manually, where linguists annotate each sentence with syntactic structure, or semi-automatically, where a parser assigns some syntactic structure which linguists then check and, if necessary, correct.
Some treebanks follow a specific linguistic theory in their syntactic annotation (e.g. the [http://www.bultreebank.org/ BulTreeBank] follows HPSG) but most try to be less theory-specific. However, two main groups can be distinguished: treebanks that annotate phrase structure (for example the [http://www.cis.upenn.edu/~treebank/ Penn Treebank]) and those that annotate dependency structure (for example the [http://ufal.mff.cuni.cz/pdt/ Prague Dependency Treebank]).
The syntactic structure in a treebank can be represented in many different ways, for example using simple labelled brackets in a text file, like this (following the [http://www.cis.upenn.edu/~treebank/ Penn Treebank]):
(S (NP (NNP John))
(VP (VBZ loves)
(NP (NNP Mary)))
(. .))
or a treebank-specific XML scheme. (en)
- ツリーバンク(英: Treebank)とは、コーパスの一種であり、各文に統語構造の注釈が付与されているものである。統語構造は一般に木構造で表されることが多いため、ツリーバンクと呼ばれる。ツリーバンクはコーパス言語学で文法的現象の研究に使われる他、計算言語学での構文解析器の評価や訓練に使われる。 (ja)
|
| p:hasPhotoCollection
| |
| p:reference
| |
| p:wikipage-ja
| |
| rdfs:comment
| - A treebank is a text corpus in which each sentence has been annotated with syntactic structure. Syntactic structure is commonly represented as a tree structure, hence the name treebank. Treebanks can be used in corpus linguistics for studying syntactic phenomena or in computational linguistics for training or testing parsers. (en)
- ツリーバンク(英: Treebank)とは、コーパスの一種であり、各文に統語構造の注釈が付与されているものである。統語構造は一般に木構造で表されることが多いため、ツリーバンクと呼ばれる。ツリーバンクはコーパス言語学で文法的現象の研究に使われる他、計算言語学での構文解析器の評価や訓練に使われる。 (ja)
|
| rdfs:label
| |
| skos:subject
| |
| foaf:page
| |