Commit 85167180bcbe0565a09269257456961365cf6ff3

Authored by Alina Wróblewska
1 parent f7a92fb7

Typos corrected

Showing 1 changed file with 5 additions and 5 deletions
README.md
1 1 # Summary
2 2  
3   -The Polish PDB-UD treebank is autmatically converted of the [Polish Dependency Bank 2.0 (PDB 2.0)](http://zil.ipipan.waw.pl/PDB). The both treebanks, PDB 2.0 and PDB-UD, were created at the [Institute of Computer Science, Polish Academy of Sciences](https://ipipan.waw.pl/en/) in Warsaw (Poland). As of the release 2.4, PDB-UD is included in the [Universal Dependencies](https://universaldependencies.org) collection ([UD Polish PDB](https://universaldependencies.org/treebanks/pl_pdb/index.html)) and it substituted for the first UD Polish SZ treebank (UD releases 1.1-2.3).
  3 +The Polish PDB-UD treebank is automatically converted from the [Polish Dependency Bank 2.0 (PDB 2.0)](http://zil.ipipan.waw.pl/PDB). Both treebanks, PDB 2.0 and PDB-UD, were created at the [Institute of Computer Science, Polish Academy of Sciences](https://ipipan.waw.pl/en/) in Warsaw (Poland). As of the release 2.4, PDB-UD is included in the [Universal Dependencies](https://universaldependencies.org) collection ([UD Polish PDB](https://universaldependencies.org/treebanks/pl_pdb/index.html)) and it substituted for the first UD Polish SZ treebank (UD releases 1.1-2.3).
4 4  
5 5 NKJP1M-UD is a part of NKJP with the manual morpho-syntactic annotations and the partially manual dependency trees automatically converted into Universal Dependencies format.
6 6  
7 7  
8 8 # PDB-UD
9 9  
10   -The treebank consists of 22,152 sentences (350,036 tokens) from [Polish National Corpus](http://nkjp.pl), [Europarl](http://www.statmt.org/europarl), [DGT-Translation Memory](https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory), [OPUS](http://opus.nlpl.eu), [Pelcra Prallel Corpus](http://metashare.dfki.de/repository/browse/pelcra-polish-english-parallel-corpus-of-literary-works-cc-by/e99fa4c063f111e2bff4525400d761472dc239ffeb6f47bda0553af53ddd5ef0/), [CDSCorpus](http://zil.ipipan.waw.pl/Scwad/CDSCorpus) and literature.
  10 +The treebank consists of 22,152 sentences (350,036 tokens) from [Polish National Corpus](http://nkjp.pl), [Europarl](http://www.statmt.org/europarl), [DGT-Translation Memory](https://ec.europa.eu/jrc/en/language-technologies/dgt-translation-memory), [OPUS](http://opus.nlpl.eu), [Pelcra Parallel Corpus](http://metashare.dfki.de/repository/browse/pelcra-polish-english-parallel-corpus-of-literary-works-cc-by/e99fa4c063f111e2bff4525400d761472dc239ffeb6f47bda0553af53ddd5ef0/), [CDSCorpus](http://zil.ipipan.waw.pl/Scwad/CDSCorpus) and literature.
11 11  
12 12 The PDB trees (i.e. morphological, syntactic and semantic annotations) were automatically converted to the PDB-UD trees. The conversion procedure is rule-based and it is partly based on the conversion of the UD PL-SZ trees. The following dependency labels are used in PDB-UD: `acl`, `acl:relcl`, `advcl`, `advcl:cmpr`, `advcl:relcl`, `advmod`, `advmod:arg`, `advmod:emph`, `advmod:neg`, `amod`, `amod:flat`, `appos`, `aux`, `aux:clitic`, `aux:cnd`, `aux:imp`, `aux:pass`, `case`, `cc`, `cc:preconj`, `ccomp`, `ccomp:cleft`, `ccomp:obj`, `conj`, `cop`, `csubj`, `csubj:pass`, `dep`, `det`, `det:numgov`, `det:nummod`, `det:poss`, `discourse:intj`, `expl:pv`, `fixed`, `flat`, `flat:foreign`, `iobj`, `list`, `mark`, `nmod`, `nmod:arg`, `nmod:flat`, `nmod:poss`, `nmod:pred`, `nsubj`, `nsubj:pass`, `nummod`, `nummod:flat`, `nummod:gov`, `obj`, `obl`, `obl:agent`, `obl:arg`, `obl:cmpr`, `obl:orphan`, `orphan`, `parataxis:insert`, `parataxis:obj`, `punct`, `root`, `vocative`, `xcomp`, `xcomp:cleft`, `xcomp:pred`, and `xcomp:subj`.
13 13  
14 14 ## Enhanced PDB-UD graphs
15   -The standard UD-like trees are enhanced with the edges encoding the shared dependents and the shared governors of the coordinated conjuncts (9141 trees with the enhanced edges).
  15 +The standard UD-like trees are enhanced with the edges encoding the shared dependents and the shared governors of the coordinated conjuncts (9141 trees with enhanced edges).
16 16  
17 17 ## Data Split
18 18  
... ... @@ -20,7 +20,7 @@ PDB-UD is divided into three parts – training (17,722 trees), test (2215 trees
20 20  
21 21 # PUD-PL treebank
22 22  
23   -The Polish Parallel Universal Dependencies treebank (PUD-PL) consists of 1000 Polish sentences (18,389 tokens) in the same order as in parallel treebanks in other languages. Morpho-syntactic annotations are automatically predicted and then manually corrected. 459 of PUD-PL trees contain enhanced edges.
  23 +The Polish Parallel Universal Dependencies treebank (PUD-PL) consists of 1000 Polish sentences (18,389 tokens) in the same order as in parallel treebanks in other languages. Morpho-syntactic annotations are automatically predicted and then manually corrected. 459 PUD-PL trees contain enhanced edges.
24 24  
25 25 # NKJP1M-UD
26 26  
... ... @@ -36,7 +36,7 @@ NKJP1M-UD is available on [GNU GPL v.3](https://www.gnu.org/licenses/gpl-3.0.en.
36 36  
37 37 # Acknowledgments
38 38  
39   -We would like to thank all of the contributors to the original Polish Dependency Bank. The development of PDB-UD was founded by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.
  39 +We would like to thank all of the contributors to the original Polish Dependency Bank. The development of PDB-UD was founded by the Polish Ministry of Science and Higher Education as part of the investment in the CLARIN-PL research infrastructure.
40 40  
41 41 ## References
42 42  
... ...