HOWTO 2.26 KB

How to develop a new morphology
===============================

1. Get acquainted with XMOR
   - Read the SFST manual
   - Read the article on the German SMOR morphology
   - Look at the XMOR files and try to understand how XMOR works

2. Do the linguistic work, i.e.
   a) Define the set of word classes
   b) Define the inflectional classes
   c) What are the derivational prefixes and their features?
   d) What are the derivational suffixes and their features?
   e) Which features will be used in the morphological analyses?

3. Modify symbols.fst to include your word classes, inflectional classes
   and morphosyntactic features

4. Write lexicon entries and store them in the file "lexicon".
   Add entries for derivational prefixes and suffixes.

   The Perl script morph-match.perl can be used to do the necessary
   alignment of lemma and surface form automatically. Try for instance:

   echo "mice\tmouse\tN base native NounPl" | ./morph-match.perl

5. Define the different inflectional paradigms in "inflection.fst".

6. Write the morpho-phonological rules which transform the morpheme
   sequences to the correct surface strings and store them in "phon.fst".

6. Compile and debug the morphology.


Debugging
=========

If the result is not what you have expected, you can use the following
tricks:

Check whether the compiler produces warnings of the form "assignment
of empty transducer to:". If so, check whether it is okay that the
transducer generated at this point is empty. You will find the
respective source code line at the beginning of the line with the
compiler warning.

In order to check an intermediate result of the compilation, which is
stored in $X$, you can proceed as follows:

1. Insert the command
	$X$ >> "ValueofX.a"
   in the source code after the commands which compute $X$

2. Compile

3. Examine the file "ValueofX.a":
   You can print the transducer with the command
	fst-print ValueofX.a
   You can generate with the transducer using the command
	fst-generate ValueofX.a
   You can use the command
	fst-mor ValueofX.a
   to check interactively how the transducer maps the input strings.

4. Often it is also useful to reduce the lexicon to the smallest number
   of entries needed to test something. (But don't forget to make a
   backup of the full lexicon first!)