fst-infl3.1
3.53 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
.TH fst-infl 1 "November 2004" "" "fst-infl"
.SH NAME
fst-infl fst-infl2 fst-infl3 \- morphological analysers
.SH SYNOPSIS
.B fst-infl [ options ]
.I file
[
.I input-file
[
.I output-file
]
]
.br
.B fst-infl2 [ options ]
.I file
[
.I input-file
[
.I output-file
]
]
.br
.B fst-infl3 [ options ]
.I file
[
.I input-file
[
.I output-file
]
]
.SH OPTIONS
.TP
.B \-t file
Read an alternative transducer from
.I file
and use it if the main transducer fails to find an analysis. By
iterating this option, a cascade of transducers may be tried to find
an analysis.
.TP
.B \-b
Print surface and analysis symbols. (fst-infl2 only)
.TP
.B \-n
Print multi-character symbols without the enclosing angle brackets.
(fst-infl only)
.TP
.B \-d
The analyses are symbolically disambiguated by returning only analyses
with a minimal number of morphemes. This option requires that morpheme
boundaries are marked with the tag <X>. If no <X> tag is found in the
analysis string, then the program (basically) counts the number of
multi-character symbols consisting entirely of upper-case characters
and uses this count for disambiguation. The latter heuristic was
developed for the German SMOR morphology. (This option is only
available with fst-infl2 and fst-infl3.)
.TP
.B \-e n
If no regular analysis is found, do robust matching and print analyses
with up to
.I n
edit errors. The set of edit operations currently includes
replacement, insertion and deletion. Each operation has currently a
fixed error weight of 1. (fst-infl2 only)
.TP
.B \-% f
Disambiguates the analyses statistically and prints the most likely
analyses with at least f % of the total probability mass of the
analyses. The transducer weights are read from a file obtained by
appending
.I .prob
to the name of the transducer file. The weight files are created with
.I fst-train.
(fst-infl2 only)
.TP
.B \-p
Print the probability of each analysis. (fst-infl2 only)
.TP
.B \-c
use this option if the transducer was compiled on a computer with a
different endianness. If you have a transducer which was compiled
on a Sparc computer and you want to use it on a Pentium, you need to
use this option. (fst-infl2 only)
.TP
.B \-q
Suppress status messages.
.TP
.B \-h
Print usage information.
.SH DESCRIPTION
.I fst-infl
is a morphological analyser. The first argument is the name of a file
which was generated by
.I fst-compiler.
The second argument is the name of the input file. The third argument
is the output file. If the third argument is missing, output is
directed to
.I stdout.
If the second argument is missing, as well, input is read from
.I stdin.
.I fst-infl2
is similar to
.I fst-infl
but needs a transducer in compact format (see the man pages for
.I fst-compiler and fst-compact). fst-infl2 is implemented differently
from fst-infl and usually much faster.
.I fst-infl3
is also similar to
.I fst-infl
but needs a transducer in lowmem format (see the man pages for
.I fst-compiler and fst-lowmem). fst-infl3 accesses the transducer on
disc rather than reading it into memory. It starts very fast and needs
very little memory, but is slower than fst-infl2.
.I fst-infl
reads the transducer which is stored in the argument file. Then it
reads the input file line by line. Each line is analysed with the
transducer and all resulting analyses are printed (see also the man
pages for
.I fst-mor).
.SH BUGS
No bugs are known so far.
.SH "SEE ALSO"
fst-compiler, fst-mor
.SH AUTHOR
Helmut Schmid,
Institute for Computational Linguistics,
University of Stuttgart,
Email: schmid@ims.uni-stuttgart.de,
This software is available under the GNU Public License.