Simultaneous Word-Morpheme Alignment for Statistical Machine Translation

Elif Eyigoz, Daniel Gildea and Kemal Oflazer

Current word alignment models for statistical machine translation do not address morphology beyond merely splitting words. We present a twolevel alignment model that distinguishes between words and morphemes, in which we embed an IBM Model 1 inside an HMM based word alignment model. The model jointly induces word and morpheme alignments using an EM algorithm. We evaluated our model on Turkish-English parallel data. We obtained significant improvement of BLEU scores over IBM Model 4. Our results indicate that utilizing information from morphology improves the quality of word alignments.

Back to Papers Accepted