MARCELL-TOOLS
Prerequisites
This project requires a Conda installation. For instructions on how to install miniconda see: https://conda.io/miniconda.html.
Creating a new environment
To install the project for first time, create new conda environment:
On Linux/MacOS machines, create a new marcell-tools environment with:
conda env create -n marcell-tools -f environment.yml
source activate marcell-tools
pip install -e .
On Windows machines the activation command is slightly different:
conda env create -n marcell-tools -f environment.yml
activate marcell-tools
pip install -e .
Updating environment requirements
To update changes in the requirements (as defined in the environment.yml file) orsetup.py
:
Linux/MacOS:
conda env update -n marcell-tools -f environment.yml
source activate marcell-tools
pip install -e .
Windows:
conda env update -n marcell-tools -f environment.yml
activate marcell-tools
pip install -e .
Pyfasttext
The pyfasttext module has to be installed separately within the environment:
- source activate marcell-tools
- pip install pyfasttext
Collector
To install the Collector first you have to create a marcell database:
- sudo -u postgres createdb marcell -E UTF8 -T template0 -l pl_PL.utf8 [-p 5432]
Create a marcell user:
- sudo -u postgres createuser marcell [-p 5432]
Access a PostgreSQL interactive terminal:
- sudo -u postgres psql [-p 5432]
Create a password for the marcell user:
- postgres=# alter user marcell with encrypted password '';
Grant the marcell user rights to the marcell database:
- postgres=# grant all privileges on database marcell to marcell;
Grant the marcell user rights for creating new databases (for testing purposes):
- postgres=# alter user marcell createdb;
Update 'PASSWORD' and 'PORT' (if needed) keys in the DATABASES 'default' item in the collector/settings.py file.
Create database tables:
- python ./collector/manage.py makemigrations
- python ./collector/manage.py migrate
And superuser (if needed):
- python ./collector/manage.py createsuperuser
Configure pipelines:
- python ./collector/manage.py configure_[PROJECT_NAME]_pipelines
Download documents:
- python ./collector/manage.py download_[PROJECT_NAME]_documents
Extract text from documents:
- python ./collector/manage.py extract_[PROJECT_NAME]_documents
Write documents in the defined output format:
- python ./collector/manage.py write_[PROJECT_NAME]_documents
where the PROJECT_NAME is 'marcell' or 'ppc'.
To use the annotation module install also:
latest Morfeusz2 version (http://sgjp.pl/morfeusz/download/)
Concraft2
Liner2 (https://github.com/CLARIN-PL/Liner2) with the PolEval 2018 model (https://clarin-pl.eu/dspace/bitstream/handle/11321/598/liner26_model_ner_nkjp.zip)
COMBO (https://github.com/360er0/COMBO) with the latest desirable model (http://zil.ipipan.waw.pl/PDB/PDBparser)
Than set proper paths for programs and used models (CONCRAFT2_PATH, CONCRAFT2_MODEL_PATH, LINER2_PATH, LINER2_MODEL_PATH, COMBO_PATH, COMBO_MODEL_PATH) in the collector/settings.py file.
You can also define a port assigned to the Concraft2 server (CONCRAFT2_PORT), number of used cores (CONCRAFT2_CORES) and an additional dictionary supporting disabiguation (FREQ_1M_PATH).
Remember to re-activate
When running the project, remember to activate the marcell-tools environment:
On Linux/MacOS:
source activate marcell-tools
Windows:
activate marcell-tools
Verify and test
TBA