Commit 9020263248075415d44207bc8d1c51b3ee96a6a2
1 parent
50f3afb6
Update README.
Showing
1 changed file
with
10 additions
and
6 deletions
README.md
1 | 1 | # The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego # |
2 | 2 | |
3 | -The *[Polish Parliamentary Corpus (PPC)](http://clip.ipipan.waw.pl/PPC)* is a large collection of linguistically analysed documents from the proceedings of *Polish Parliament*, [Sejm](http://opis.sejm.gov.pl/en/) and [Senate](http://www.senat.gov.pl/en/). It is based on the [Polish Sejm Corpus](http://clip.ipipan.waw.pl/PSC) co-funded by project [CESAR](http://clip.ipipan.waw.pl/CESAR) and is currently being updated by [CLARIN-PL](http://clip.ipipan.waw.pl/CLARIN-PL-3) infrastructure. | |
3 | +The *[Polish Parliamentary Corpus (PPC)](http://clip.ipipan.waw.pl/PPC)* is a large collection of linguistically analysed documents from the proceedings of the *Polish Parliament*, [Sejm](http://opis.sejm.gov.pl/en/) and [Senate](http://www.senat.gov.pl/en/). It is based on the [Polish Sejm Corpus](http://clip.ipipan.waw.pl/PSC) co-funded by project [CESAR](http://clip.ipipan.waw.pl/CESAR) and is currently being updated by [CLARIN-PL](http://clip.ipipan.waw.pl/CLARIN-PL-3) infrastructure. | |
4 | 4 | |
5 | 5 | ## Corpus data ## |
6 | 6 | |
7 | -The current size of the corpus amounts over 700M segments. Apart from the stenographic records of plenary sittings and committee sittings, the corpus contains also interpellations and questions. | |
7 | +The current size of the corpus amounts to over 700M segments. Apart from the stenographic records of plenary sittings and committee sittings, the corpus also contains interpellations and questions. | |
8 | 8 | |
9 | 9 | Corpus files are made available in *XML TEI P5* format compatible with the annotation used by the [National Corpus of Polish](http://nkjp.pl/index.php?page=0&lang=1). This repository contains *Unannotated TEI version* of the corpora. For annotated version please go to the [PPC homepage](http://clip.ipipan.waw.pl/PPC). |
10 | 10 | |
... | ... | @@ -19,15 +19,19 @@ The parliamentary data is public domain. The corpus annotations are available on |
19 | 19 | |
20 | 20 | ## Publications ## |
21 | 21 | |
22 | -[Maciej Ogrodniczuk and Bartłomiej Nitoń. *New developments in the Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the Second ParlaCLARIN Workshop, pages 1–4, Marseille, France, 2020. European Language Resources Association (ELRA).](https://www.aclweb.org/anthology/2020.parlaclarin-1.1.pdf) | |
22 | + * [Maciej Ogrodniczuk and Bartłomiej Nitoń. *New developments in the Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the Second ParlaCLARIN Workshop, pages 1–4, Marseille, France, 2020. European Language Resources Association (ELRA).](https://www.aclweb.org/anthology/2020.parlaclarin-1.1.pdf) | |
23 | 23 | |
24 | 24 | |
25 | -[Maciej Ogrodniczuk. *Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the LREC 2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora, pages 15–19, Paris, France, 2018. European Language Resources Association (ELRA).](http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf) | |
25 | + * [Maciej Ogrodniczuk. *Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the LREC 2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora, pages 15–19, Paris, France, 2018. European Language Resources Association (ELRA).](http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf) | |
26 | 26 | |
27 | 27 | |
28 | -[Maciej Ogrodniczuk. *The Polish Sejm Corpus*. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pages 2219–2223, Istanbul, Turkey, 2012. European Language Resources Association (ELRA).](http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf) | |
28 | + * [Maciej Ogrodniczuk. *The Polish Sejm Corpus*. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pages 2219–2223, Istanbul, Turkey, 2012. European Language Resources Association (ELRA).](http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf) | |
29 | + | |
30 | +## See also ## | |
31 | + | |
32 | +* [The slides](https://www.clarin.eu/sites/default/files/2-ogrodniczuk.pdf) from [CLARIN-PLUS Workshop "Working with Parliamentary Records"](https://www.clarin.eu/event/2017/clarin-plus-workshop-working-parliamentary-records). Sofia, 27–29 March 2017. | |
33 | +* [ParlaMint](https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora) project reusing data from the Polish Parliamentary Corpus in a multilingual setting. | |
29 | 34 | |
30 | -Please see also [the slides](https://www.clarin.eu/sites/default/files/2-ogrodniczuk.pdf) from [CLARIN-PLUS Workshop "Working with Parliamentary Records"](https://www.clarin.eu/event/2017/clarin-plus-workshop-working-parliamentary-records). Sofia, 27–29 March 2017. | |
31 | 35 | |
32 | 36 | ## Contact ## |
33 | 37 | |
... | ... |