README.md
The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego
The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.
Corpus data
The current size of the corpus amounts over 700M segments. Apart from the stenographic records of plenary sittings and committee sittings, the corpus contains also interpellations and questions.
Corpus files are made available in XML TEI P5 format compatible with the annotation used by the National Corpus of Polish. This repository contains Unannotated TEI version of the corpora. For annotated version please go to the PPC homepage.
Searching the corpus
Licence
The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
Publications
Please see also the slides from CLARIN-PLUS Workshop "Working with Parliamentary Records". Sofia, 27–29 March 2017.
Contact
Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences
For more information please go to the PPC homepage