CHANGES 6.28 KB

Edit Raw Blame History

-----------------------------------------------------------------------
v0.5.1

- Issue 10 - loadModel() method from DepdendencyParser should also be
  able to receive an InputStream
- Issue 9 - Add a method to DependencyParser which return the Parse
  Trees
- Issue 7 - Update source folder at ant script
- Issue 6 - Change visibility of some methods and attributes to
  facilitate wrapping
- Issue 2 - Convert project to maven

-----------------------------------------------------------------------
v0.5.0

  UNKNOWN

-----------------------------------------------------------------------
v0.4.3b

- Fixed bug: DependencyInstance serialization was not handling the
  feats. This caused errors when using the non-projective decoder with
  second order. (JMB 4-APR-07)

-----------------------------------------------------------------------
v0.4.3

- Forest files are created in the tmp directory. Without this, two
  instances of MSTParser being run on the same data set would
  overwrite each other's feature forest files. Also, the forest files
  created in tmp are deleted when the Java VM exits. (JMB, 21-JAN-07).

- Separated out the standard sentential parsing features from extra
  features used for discourse parsing. (JMB, 23-MAR-07)

- Created ParserOptions so that it is easier to pass various options
  between the parser and the pipes. (JMB, 23-MAR-07)

- Fixed bug in serialization of DependencyInstances -- lemmas were not
  being written out, and this caused the 2nd order stuff to
  crash. (JMB 23-MAR-07)


-----------------------------------------------------------------------
v0.4.2

- Results have improved slightly over previous testbed results. This
  may be due to the fact that FeatureVector.dotProduct would have got
  -1 return values on keys not held in the TIntDoubleHashMap for the
  second vector in the previous version of Trove. Now that Trove
  returns 0, this is actually the right behavior in this case. Another
  possible explanation is that there is some minor change in the
  features which are generated. Since the output has changed so
  little, and for the better, I'll leave it at that for now. The
  testbed results and output have been updated to reflect the current
  version. (JMB, 17-JAN-07)

- Uncommented a line in DependencyPipe that removed some features from
  the parsing models in the previous release. (Need to come up with a
  better way of defining different pipes!) (JMB, 17-JAN-07)

- Changed the FeatureVector implementation to be a TLinkedList of
  Feature objects, with two optional sub-FeatureVectors contained
  within. This supports fast concatenation of two FeatureVectors since
  it is no longer necessary to copy entire lists. Also, rather than
  explicitly negating features for the getDistVector() method, a
  boolean value is set that can optionally indicate the second
  sub-FeatureVector as negated. The logic of the other methods then
  preserves the negation (and negation with negation). Again, this
  means we don't have to make copies for this operation. These changes
  led sped up training by a factor of 2 to 4 (depedending on the
  number of features used in the parsing model) and parsing by up to
  1.5 times. (JMB, 17-JAN-07)

- Updated to Trove v1.1b5. Changed default return value of
  TObjectIntHashMap to be -1 rather than 0, so it is important to use
  the included trove.jar rather than downloading and using one from
  the Trove project. (Note: I tried to update to v2.0a2, but the test
  suites broke with that version. Attempts to sort out the problem
  were unsuccessful, so V1.1b5 will just have to do for now.) (JMB,
  16-JAN-07)

- Removed addIfNotPresent boolean from lookupIndex in Alphabet since
  it isn't used in MSTParser and it incurs an extra method call and
  boolean check on a very common method. (JMB, 16-JAN-07)

- Added support for relational features, which hold between two
  utterances. These features are defined as an NxN matrix (N=number of
  parsing units) below the main CoNLL format declarations. This is
  mainly introduced for discourse parsing to allow for features like
  whether two parsing units are in the same sentence or paragraph, or
  if they both contain references to the same entity. It can be
  ignored for sentence parsing -- everything continues to work as
  before. (The distance between two units is an example of such a
  feature in sentence parsing, but this can be computed on the fly, so
  it isn't necessary to use such a matrix.) (JMB, 14-JAN-07)


-----------------------------------------------------------------------
v0.4.0

- Cleaned up Pipes considerably; eg, Pipe2O doesn't replicate so much
  code from Pipe. Many of the createFeatureVector methods were renamed
  to things like addCoreFeatures. (JMB)

- If one uses MST format, the creation of posA and the
  5-character-substring features now are put into dependency instances
  in MSTReader as the course pos tags and lemmas, respectively. Then
  in the feature extraction code, rather than creating posA etc on the
  fly, it just references those fields in the dependency
  instance. That way, if you use conll format, you get to use lemma
  and course tag values supplied by the annotations. (JMB)

- Can utilize the FEAT1|FEAT2|...|FEATN field of the CONLL format to
  allow abitrary features. See addCoreFeatures() in the DependencyPipe
  class. (JMB)

-----------------------------------------------------------------------
v0.2.2

- MSTParser now works with both MST and CONLL formats.  Pipes are now
  passed a parameter for which format they use, and they call upon
  Readers and Writers that know how to handle each format. CONLL is
  the default format. (JMB)

- Added a subset of the Portuguese data from CONLL to test the CONLL
  format and to have another data set for the testbed. See TESTBED
  (JMB)

- Included an Ant build system that does some nice things, but which
  can be ignored if make is preferred. Highlights of the additional
  capabilities are: (1) class files are put in a location
  (./output/classes) separate from the .java files; (2) you can get
  javadocs (./doc/api) by running "sh build.sh javadoc"; (3) you can
  make a release with "sh build.sh release" You don't need to install
  anything extra ( ant.jar in in ./lib); the only additional steps
  needed to use the Ant build setup is to set the JAVA_HOME and
  MSTPARSER_DIR environment variables appropriately. (JMB)