Commit ab5dbe37545fba606c018bbc28a092cb46ad5358

Authored by Bartłomiej Nitoń
1 parent 2100e842

Add README.

Showing 1 changed file with 36 additions and 0 deletions
README.md 0 → 100644
  1 +# The Polish Parliamentary Corpus / Korpus Dyskursu Parlamentarnego #
  2 +
  3 +The *[Polish Parliamentary Corpus (PPC)](http://clip.ipipan.waw.pl/PPC)* is a large collection of linguistically analysed documents from the proceedings of *Polish Parliament*, [Sejm](http://opis.sejm.gov.pl/en/) and [Senate](http://www.senat.gov.pl/en/). It is based on the [Polish Sejm Corpus](http://clip.ipipan.waw.pl/PSC) co-funded by project [CESAR](http://clip.ipipan.waw.pl/CESAR) and is currently being updated by [CLARIN-PL](http://clip.ipipan.waw.pl/CLARIN-PL-3) infrastructure.
  4 +
  5 +## Corpus data ##
  6 +
  7 +The current size of the corpus amounts over 700M segments. Apart from the stenographic records of plenary sittings and committee sittings, the corpus contains also interpellations and questions.
  8 +
  9 +Corpus files are made available in *XML TEI P5* format compatible with the annotation used by the [National Corpus of Polish](http://nkjp.pl/index.php?page=0&lang=1). This repository contains *Unannotated TEI version* of the corpora. For annotated version please go to the [PPC homepage](http://clip.ipipan.waw.pl/PPC).
  10 +
  11 +## Searching the corpus ##
  12 +
  13 + * [using the search engine](https://kdp.nlp.ipipan.waw.pl/)
  14 + * [using the ngram viewer](http://ngram.kdp.nlp.ipipan.waw.pl/)
  15 +
  16 +## Licence ##
  17 +
  18 +The parliamentary data is public domain. The corpus annotations are available on CC-BY (attribution) licence.
  19 +
  20 +## Publications ##
  21 +
  22 +[Maciej Ogrodniczuk and Bartłomiej Nitoń. *New developments in the Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the Second ParlaCLARIN Workshop, pages 1–4, Marseille, France, 2020. European Language Resources Association (ELRA).](https://www.aclweb.org/anthology/2020.parlaclarin-1.1.pdf)
  23 +
  24 +
  25 +[Maciej Ogrodniczuk. *Polish Parliamentary Corpus*. In Darja Fišer, Maria Eskevich, and Franciska de Jong, editors, Proceedings of the LREC 2018 Workshop ParlaCLARIN: Creating and Using Parliamentary Corpora, pages 15–19, Paris, France, 2018. European Language Resources Association (ELRA).](http://lrec-conf.org/workshops/lrec2018/W2/pdf/11_W2.pdf)
  26 +
  27 +
  28 +[Maciej Ogrodniczuk. *The Polish Sejm Corpus*. In Proceedings of the Eighth International Conference on Language Resources and Evaluation, LREC 2012, pages 2219–2223, Istanbul, Turkey, 2012. European Language Resources Association (ELRA).](http://www.lrec-conf.org/proceedings/lrec2012/pdf/653_Paper.pdf)
  29 +
  30 +Please see also [the slides](https://www.clarin.eu/sites/default/files/2-ogrodniczuk.pdf) from [CLARIN-PLUS Workshop "Working with Parliamentary Records"](https://www.clarin.eu/event/2017/clarin-plus-workshop-working-parliamentary-records). Sofia, 27–29 March 2017.
  31 +
  32 +## Contact ##
  33 +
  34 +[Maciej Ogrodniczuk](http://zil.ipipan.waw.pl/MaciejOgrodniczuk), Institute of Computer Science, Polish Academy of Sciences
  35 +
  36 +For more information please go to the [PPC homepage](http://clip.ipipan.waw.pl/PPC)
... ...