The Polish PDB-UD treebank is autmatically converted of the Polish Dependency Bank 2.0 (PDB 2.0). The both treebanks, PDB 2.0 and PDB-UD, were created at the Institute of Computer Science, Polish Academy of Sciences in Warsaw (Poland). As of the release 2.4, PDB-UD is included in the Universal Dependencies collection (UD Polish PDB) and it substituted for the first UD Polish SZ treebank (UD releases 1.1-2.3).
The PDB trees (i.e. morphological, syntactic and semantic annotations) were automatically converted to the PDB-UD trees. The conversion procedure is rule-based and it is partly based on the conversion of the UD PL-SZ trees. The following dependency labels are used in PDB-UD:
There are two versions of the PDB-UD trees:
- the standard UD-like trees with the enhanced edges encoding the shared dependents and the shared governors of the coordinated conjuncts (9141 trees with the enhanced edges),
- the enhanced graphs with the semantic roles of some dependents.
PDB-UD is divided into three parts – training (17,722 trees), test (2215 trees) and development (2215 trees) data sets. The procedure of assigning dependency trees to particular data sets is generally random while maintaining the proportion of data from individual sources. There is one constraint on the dividing procedure: if a sentence occurs in the test, dev or train subset of the UD Polish LFG treebank, this sentence is assigned to the test, dev or train set of the Polish PDB-UD treebank, respectively.
Polish PUD treebank
The Polish Parallel Universal Dependencies treebank (PUD-PL) consists of 1000 Polish sentences (18,389 tokens) in the same order as in parallel treebanks in other languages. Morpho-syntactic annotations are automatically predicted and then manually corrected. 459 of PUD-PL trees contain enhanced edges.
The Polish PUD treebank is distribute under CC BY-SA 4.0
We would like to thank all of the contributors to the original Polish Dependency Bank. The development of PDB-UD was founded by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.
- Alina Wróblewska (2018) Extended and Enhanced Polish Dependency Bank in Universal Dependencies Format. In Proceedings of Universal Dependencies Workshop 2018 (UDW 2018).