CHANGES
6.28 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
-----------------------------------------------------------------------
v0.5.1
- Issue 10 - loadModel() method from DepdendencyParser should also be
able to receive an InputStream
- Issue 9 - Add a method to DependencyParser which return the Parse
Trees
- Issue 7 - Update source folder at ant script
- Issue 6 - Change visibility of some methods and attributes to
facilitate wrapping
- Issue 2 - Convert project to maven
-----------------------------------------------------------------------
v0.5.0
UNKNOWN
-----------------------------------------------------------------------
v0.4.3b
- Fixed bug: DependencyInstance serialization was not handling the
feats. This caused errors when using the non-projective decoder with
second order. (JMB 4-APR-07)
-----------------------------------------------------------------------
v0.4.3
- Forest files are created in the tmp directory. Without this, two
instances of MSTParser being run on the same data set would
overwrite each other's feature forest files. Also, the forest files
created in tmp are deleted when the Java VM exits. (JMB, 21-JAN-07).
- Separated out the standard sentential parsing features from extra
features used for discourse parsing. (JMB, 23-MAR-07)
- Created ParserOptions so that it is easier to pass various options
between the parser and the pipes. (JMB, 23-MAR-07)
- Fixed bug in serialization of DependencyInstances -- lemmas were not
being written out, and this caused the 2nd order stuff to
crash. (JMB 23-MAR-07)
-----------------------------------------------------------------------
v0.4.2
- Results have improved slightly over previous testbed results. This
may be due to the fact that FeatureVector.dotProduct would have got
-1 return values on keys not held in the TIntDoubleHashMap for the
second vector in the previous version of Trove. Now that Trove
returns 0, this is actually the right behavior in this case. Another
possible explanation is that there is some minor change in the
features which are generated. Since the output has changed so
little, and for the better, I'll leave it at that for now. The
testbed results and output have been updated to reflect the current
version. (JMB, 17-JAN-07)
- Uncommented a line in DependencyPipe that removed some features from
the parsing models in the previous release. (Need to come up with a
better way of defining different pipes!) (JMB, 17-JAN-07)
- Changed the FeatureVector implementation to be a TLinkedList of
Feature objects, with two optional sub-FeatureVectors contained
within. This supports fast concatenation of two FeatureVectors since
it is no longer necessary to copy entire lists. Also, rather than
explicitly negating features for the getDistVector() method, a
boolean value is set that can optionally indicate the second
sub-FeatureVector as negated. The logic of the other methods then
preserves the negation (and negation with negation). Again, this
means we don't have to make copies for this operation. These changes
led sped up training by a factor of 2 to 4 (depedending on the
number of features used in the parsing model) and parsing by up to
1.5 times. (JMB, 17-JAN-07)
- Updated to Trove v1.1b5. Changed default return value of
TObjectIntHashMap to be -1 rather than 0, so it is important to use
the included trove.jar rather than downloading and using one from
the Trove project. (Note: I tried to update to v2.0a2, but the test
suites broke with that version. Attempts to sort out the problem
were unsuccessful, so V1.1b5 will just have to do for now.) (JMB,
16-JAN-07)
- Removed addIfNotPresent boolean from lookupIndex in Alphabet since
it isn't used in MSTParser and it incurs an extra method call and
boolean check on a very common method. (JMB, 16-JAN-07)
- Added support for relational features, which hold between two
utterances. These features are defined as an NxN matrix (N=number of
parsing units) below the main CoNLL format declarations. This is
mainly introduced for discourse parsing to allow for features like
whether two parsing units are in the same sentence or paragraph, or
if they both contain references to the same entity. It can be
ignored for sentence parsing -- everything continues to work as
before. (The distance between two units is an example of such a
feature in sentence parsing, but this can be computed on the fly, so
it isn't necessary to use such a matrix.) (JMB, 14-JAN-07)
-----------------------------------------------------------------------
v0.4.0
- Cleaned up Pipes considerably; eg, Pipe2O doesn't replicate so much
code from Pipe. Many of the createFeatureVector methods were renamed
to things like addCoreFeatures. (JMB)
- If one uses MST format, the creation of posA and the
5-character-substring features now are put into dependency instances
in MSTReader as the course pos tags and lemmas, respectively. Then
in the feature extraction code, rather than creating posA etc on the
fly, it just references those fields in the dependency
instance. That way, if you use conll format, you get to use lemma
and course tag values supplied by the annotations. (JMB)
- Can utilize the FEAT1|FEAT2|...|FEATN field of the CONLL format to
allow abitrary features. See addCoreFeatures() in the DependencyPipe
class. (JMB)
-----------------------------------------------------------------------
v0.2.2
- MSTParser now works with both MST and CONLL formats. Pipes are now
passed a parameter for which format they use, and they call upon
Readers and Writers that know how to handle each format. CONLL is
the default format. (JMB)
- Added a subset of the Portuguese data from CONLL to test the CONLL
format and to have another data set for the testbed. See TESTBED
(JMB)
- Included an Ant build system that does some nice things, but which
can be ignored if make is preferred. Highlights of the additional
capabilities are: (1) class files are put in a location
(./output/classes) separate from the .java files; (2) you can get
javadocs (./doc/api) by running "sh build.sh javadoc"; (3) you can
make a release with "sh build.sh release" You don't need to install
anything extra ( ant.jar in in ./lib); the only additional steps
needed to use the Ant build setup is to set the JAVA_HOME and
MSTPARSER_DIR environment variables appropriately. (JMB)