Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment
Thomas Schoenemann
The 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013)
Sofia, Bulgaria, August 4-9, 2013
Abstract
We derive variants of the fertility based models IBM-3 and IBM-4 that, while maintaining their zero and first order parameters, are nondeficient. Subsequently, we proceed to derive a method to compute a likely alignment and its neighbors as well as give a solution of EM training. The arising M-step energies are non-trivial and handled via projected gradient ascent.
Our evaluation on gold alignments shows substantial improvements (in weighted F-measure) for the IBM-3. For the IBM-4 there are no consistent improvements. Training the nondeficient IBM-5 in the regular way gives surprisingly good results.
Using the resulting alignments for phrase-based translation systems offers no clear insights w.r.t. BLEU scores.
START
Conference Manager (V2.61.0 - Rev. 2792M)