Training Parsers on Incompatible Treebanks
Richard Johansson
We consider the problem of training a statistical parser in the
situation when there are multiple treebanks available, and these
treebanks are annotated according to different linguistic
conventions.
To address this problem, we present two simple adaptation methods:
the first method is based on the idea of using a shared feature
representation when parsing multiple treebanks, and the second method
on guided parsing where the output of one parser provides features
for a second one.
To evaluate and analyze the adaptation methods, we train parsers
on treebank pairs in four languages: German, Swedish, Italian, and English.
We see significant improvements for all eight treebanks when training
on the full training sets. However, the clearest benefits are seen when we
consider smaller training sets. Our experiments were carried out with
unlabeled dependency parsers, but the methods can easily be
generalized to other feature-based parsers.
Back to Papers Accepted