Name Last Update
odt2tei Loading commit data...
.gitignore Loading commit data...
README.md Loading commit data...
TODO.md Loading commit data...
convert.sh Loading commit data...
odt2tei.py Loading commit data...
test.sh Loading commit data...
upload.sh Loading commit data...

README.md

ODT to TEI converter

A converter for ODT document with the proceedings of Polish Parliament.

The conversion to Markdown is performed using pandoc. Then, the formatting is mostly removed as it is unreliable. Speakers and events are detected and reformatted.

Then, multisession documents are split so that each document contains only one session. The metadata are autodetected.

Finally TEI document in PPC format is created for each session.

The documents with missing data are marked with question marks and the warnings are printed to standard output.