A Systematic Bayesian Treatment of the IBM Alignment Models
Yarin Gal and Phil Blunsom
The dominant yet ageing IBM and HMM
word alignment models underpin most
popular Statistical Machine Translation
implementations in use today. Though
beset by the limitations of implausible
independence assumptions, intractable
optimisation problems, and an excess of
tunable parameters, these models provide
a scalable and reliable starting point for
inducing translation systems. In this paper we
build upon this venerable base by recasting
these models in the non-parametric Bayesian
framework. By replacing the categorical
distributions at their core with hierarchical
Pitman-Yor processes, and through the use
of collapsed Gibbs sampling, we provide a
more flexible formulation and sidestep the
original heuristic optimisation techniques.
The resulting models are highly extendible,
naturally permitting the introduction of
phrasal dependencies. We present extensive
experimental results showing improvements
in both AER and BLEU when benchmarked
against Giza++, including significant
improvements over IBM model 4.
Back to Papers Accepted