Large-Scale Discriminative Training for Statistical Machine Translation Using Held-Out Line Search
Jeffrey Flanigan, Chris Dyer and Jaime Carbonell
We introduce a new large-scale discriminative learning algorithm for machine
translation that is capable of learning parameters in models with extremely
sparse features. To ensure their reliable estimation and to prevent
overfitting, we use a two-phase learning algorithm. First, the contribution of
individual sparse features is estimated using large amounts of parallel data.
Second, a small development corpus is used to determine the relative
contributions of the sparse features and standard dense features. Not only does
this two-phase learning approach prevent overfitting, the second pass optimizes
corpus-level BLEU of the Viterbi translation of the decoder. We demonstrate
significant improvements using sparse rule indicator features in three
different translation tasks. To our knowledge, this is the first large-scale
discriminative training algorithm capable of showing improvements over the MERT
baseline with only rule indicator features in addition to the standard MERT
features.
Back to Papers Accepted