Name Last Update
1919-1922/sejm/posiedzenia/pp
1922-1927
1928-1930
1930-1935
1935-1938
1938-1939
1943-1947/krn/posiedzenia/pp
1947-1952/sejm/posiedzenia/pp
1952-1956/sejm/posiedzenia/pp
1957-1961/sejm/posiedzenia/pp
1961-1965/sejm/posiedzenia/pp
1965-1969/sejm
1969-1972/sejm/posiedzenia/pp
1972-1976/sejm
1976-1980/sejm
1980-1985/sejm
1985-1989/sejm
1989-1991
1991-1993
1993-1997
1997-2001
2001-2005
2005-2007
2007-2011
2011-2015
2015-2019 Loading commit data...
2019-2023 Loading commit data...
.gitignore Loading commit data...
PPC_header.xml Loading commit data...
README.md Loading commit data...

README.md

The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego

The Polish Parliamentary Corpus (PPC) is a large collection of linguistically analysed documents from the proceedings of the Polish Parliament, Sejm and Senate. It is based on the Polish Sejm Corpus co-funded by project CESAR and is currently being updated by CLARIN-PL infrastructure.

Corpus data

The current size of the corpus amounts to over 700M segments. Apart from the stenographic records of plenary sittings and committee sittings, the corpus also contains interpellations and questions.

Corpus files are made available in XML TEI P5 format compatible with the annotation used by the National Corpus of Polish. This repository contains Unannotated TEI version of the corpora. For annotated version please go to the PPC homepage.

Searching the corpus

Licence

The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.

Publications

See also

Contact

Maciej Ogrodniczuk, Institute of Computer Science, Polish Academy of Sciences

For more information please go to the PPC homepage