Improving Syntax-Augmented Machine Translation by Coarsening the Label Set

Greg Hanneman and Alon Lavie

We present a new variant of the Syntax-Augmented Machine Translation (SAMT) formalism with a category-coarsening algorithm originally developed for tree-to-tree grammars. We induce bilingual labels into the SAMT grammar, use them for category coarsening, then project back to monolingual labeling as in standard SAMT. The result is a "collapsed" grammar with the same expressive power and format as the original, but many fewer nonterminal labels. We show that the smaller label set provides improved translation scores by 1.14 BLEU on two Chinese--English test sets while reducing the occurrence of sparsity and ambiguity problems common to large label sets.

Back to Papers Accepted