Name Last Update
..
resources Loading commit data...
.gitignore Loading commit data...
ENIAM_MWE.ml Loading commit data...
ENIAMconcraft.ml Loading commit data...
ENIAMpaths.ml Loading commit data...
ENIAMsentences.ml Loading commit data...
ENIAMsubsyntax.ml Loading commit data...
ENIAMsubsyntaxGraphOf.ml Loading commit data...
ENIAMsubsyntaxHTMLof.ml Loading commit data...
ENIAMsubsyntaxStringOf.ml Loading commit data...
ENIAMsubsyntaxTypes.ml Loading commit data...
ENIAMsubsyntaxXMLof.ml Loading commit data...
README Loading commit data...
TODO Loading commit data...
eniam-subsyntax-1.0.tar.bz2 Loading commit data...
eniam-subsyntax-1.1.tar.bz2 Loading commit data...
interface.ml Loading commit data...
lgpl-3.0.txt Loading commit data...
makefile Loading commit data...
test.ml Loading commit data...

README

ENIAMsubsyntax Version 1.1 :
-----------------------

ENIAMsubsyntax is a library that
- performs tokenization, lemmatization, part of speech tagging;
- detects MWE and abbreviations;
- recognizes named entities;
- splits text into sentences.

Install
-------

ENIAMsubsyntax requires OCaml version 4.02.3 compiler
together with Xlib library version 3.2 or later,
ENIAMtokenizer library version 1.1 and ENIAMmorphology library version 1.1.

In order to install type:

make install

by default, ENIAMsubsyntax is installed in the 'ocamlc -where'/eniam directory.
you can change it by editing the Makefile.

In order to test library type:
make test
./test

In order to compile a command line interface to the library type:
make interface

./interface --help provides information on command line options.

Both test and interface require graphviz installed.

By default ENIAMsubsyntax looks for resources in /usr/share/eniam directory.
However this behaviour may be changed by setting end exporting ENIAM_RESOURCE_PATH
environment variable.

Credits
-------
Copyright © 2016 Wojciech Jaworski <wjaworski atSPAMfree mimuw dot edu dot pl>
Copyright © 2016 Institute of Computer Science Polish Academy of Sciences

The library uses the following licensed resources:

NKJP1M: the manually annotated 1-million word subcorpus sampled
from texts of a subset of the National Corpus of Polish.
version 1.2

SGJP: Grammatical Dictionary of Polish, version 20151020
Copyright © 2007–2015 Zygmunt Saloni, Włodzimierz Gruszczyński, Marcin
Woliński, Robert Wołosz, Danuta Skowrońska

Licence
-------

This library is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.