# The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego #

The *[Polish Parliamentary Corpus (PPC)](http://clip.ipipan.waw.pl/PPC)* is a large collection of linguistically analysed documents from the proceedings of *Polish Parliament*, [Sejm](http://opis.sejm.gov.pl/en/) and [Senate](http://www.senat.gov.pl/en/). It is based on the [Polish Sejm Corpus](http://clip.ipipan.waw.pl/PSC) co-funded by project [CESAR](http://clip.ipipan.waw.pl/CESAR) and is currently being updated by [CLARIN-PL](http://clip.ipipan.waw.pl/CLARIN-PL-3) infrastructure.

## Corpus data ##

The current size of the corpus amounts over 700M segments. Apart from the stenographic records of plenary sittings and committee sittings, the corpus contains also interpellations and questions.

Corpus files are made available in *XML TEI P5* format compatible with the annotation used by the [National Corpus of Polish](http://nkjp.pl/index.php?page=0&lang=1). This repository contains *Unannotated TEI version* of the corpora. For annotated version please go to the [PPC homepage](http://clip.ipipan.waw.pl/PPC).

## Searching the corpus ##

 * [using the search engine](https://kdp.nlp.ipipan.waw.pl/)
 * [using the ngram viewer](http://ngram.kdp.nlp.ipipan.waw.pl/)

## Licence ##

The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.

## Publications ##

[Maciej Ogrodniczuk and Bartłomiej Nitoń. *New developments in the Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the Second ParlaCLARIN Workshop, pages 1–4, Marseille, France, 2020. European Language Resources Association (ELRA).](https://www.aclweb.org/anthology/2020.parlaclarin-1.1.pdf)


[Maciej Ogrodniczuk. *Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the LREC 2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora, pages 15–19, Paris, France, 2018. European Language Resources Association (ELRA).](http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf)


[Maciej Ogrodniczuk. *The Polish Sejm Corpus*. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pages 2219–2223, Istanbul, Turkey, 2012. European Language Resources Association (ELRA).](http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf)

Please see also [the slides](https://www.clarin.eu/sites/default/files/2-ogrodniczuk.pdf) from [CLARIN-PLUS Workshop "Working with Parliamentary Records"](https://www.clarin.eu/event/2017/clarin-plus-workshop-working-parliamentary-records). Sofia, 27–29 March 2017.

## Contact ##

[Maciej Ogrodniczuk](http://zil.ipipan.waw.pl/MaciejOgrodniczuk), Institute of Computer Science, Polish Academy of Sciences

For more information please go to the [PPC homepage](http://clip.ipipan.waw.pl/PPC)