Improving Syntax-Augmented Machine Translation by Coarsening the Label Set
Greg Hanneman and Alon Lavie
We present a new variant of the Syntax-Augmented Machine Translation (SAMT)
formalism with a category-coarsening algorithm originally developed for
tree-to-tree grammars. We induce bilingual labels into the
SAMT
grammar,
use
them for category coarsening, then project back to monolingual labeling as in
standard SAMT. The result is a "collapsed" grammar with the
same
expressive
power and format as the original, but many fewer nonterminal labels. We show
that the smaller label set provides improved translation scores by 1.14 BLEU on
two Chinese--English test sets while reducing the occurrence of sparsity and
ambiguity problems common to large label sets.
Back to Papers Accepted