INSTALL.md 4.03 KB

MARCELL-TOOLS

Prerequisites

This project requires a Conda installation. For instructions on how to install miniconda see: https://conda.io/miniconda.html.

Creating a new environment

To install the project for first time, create new conda environment:

On Linux/MacOS machines, create a new marcell-tools environment with:

conda env create -n marcell-tools -f environment.yml
source activate marcell-tools
pip install -e .

On Windows machines the activation command is slightly different:

conda env create -n marcell-tools -f environment.yml
activate marcell-tools 
pip install -e .

Updating environment requirements

To update changes in the requirements (as defined in the environment.yml file) orsetup.py:

Linux/MacOS:

conda env update -n marcell-tools -f environment.yml
source activate marcell-tools
pip install -e .

Windows:

conda env update -n marcell-tools -f environment.yml
activate marcell-tools
pip install -e .

Pyfasttext

The pyfasttext module has to be installed separately within the environment:

  • source activate marcell-tools
  • pip install pyfasttext

Collector

To install the Collector first you have to create a marcell database:

  • sudo -u postgres createdb marcell -E UTF8 -T template0 -l pl_PL.utf8 [-p 5432]

Create a marcell user:

  • sudo -u postgres createuser marcell [-p 5432]

Access a PostgreSQL interactive terminal:

  • sudo -u postgres psql [-p 5432]

Create a password for the marcell user:

  • postgres=# alter user marcell with encrypted password '';

Grant the marcell user rights to the marcell database:

  • postgres=# grant all privileges on database marcell to marcell;

Grant the marcell user rights for creating new databases (for testing purposes):

  • postgres=# alter user marcell createdb;

Update 'PASSWORD' and 'PORT' (if needed) keys in the DATABASES 'default' item in the collector/settings.py file.

Create database tables:

  • python ./collector/manage.py makemigrations
  • python ./collector/manage.py migrate

And superuser (if needed):

  • python ./collector/manage.py createsuperuser

Configure pipelines:

  • python ./collector/manage.py configure_[PROJECT_NAME]_pipelines

Download documents:

  • python ./collector/manage.py download_[PROJECT_NAME]_documents

Extract text from documents:

  • python ./collector/manage.py extract_[PROJECT_NAME]_documents

Write documents in the defined output format:

  • python ./collector/manage.py write_[PROJECT_NAME]_documents

where the PROJECT_NAME is 'marcell' or 'ppc'.

To use the annotation module install also:

Than set proper values for programs, used models and dictionaries (MORFEUSZ2_DICT_PATH, MORFEUSZ2_DICT_NAME, CONCRAFT2_PATH, CONCRAFT2_MODEL_PATH, LINER2_PATH, LINER2_MODEL_PATH, COMBO_PATH, COMBO_MODEL_PATH) in the collector/settings.py file.

You can also define a port assigned to the Concraft2 server (CONCRAFT2_PORT), number of used cores (CONCRAFT2_CORES) and an additional dictionary supporting disambiguation (FREQ_1M_PATH).

To use XIX century annotation pipeline set values for XIX_MORFEUSZ2_DICT_PATH and XIX_MORFEUSZ2_DICT_NAME (in the collector/settings.py file), default values pointing to the "parlamentareusz" dictionary will be fine.

To index documents install MTAS (https://github.com/mwasiluk/mtas), Solr (https://lucene.apache.org/solr/) and provide SOLR_URL and SOLR_TIMEOUT values in the collector/settings.py file. Name of SOLR core for particular project should be same as project name in the database.

Remember to re-activate

When running the project, remember to activate the marcell-tools environment:

On Linux/MacOS:

source activate marcell-tools

Windows:

activate marcell-tools

Verify and test

TBA