Supervised Bilingual Lexicon Induction with Multiple Monolingual Signals
Ann Irvine and Chris Callison-Burch
Prior research into learning translations from source and target language
monolingual texts has treated the task as an unsupervised learning problem.
Although many techniques take advantage of a seed bilingual lexicon, this work
is the first to use that data for supervised learning to combine a diverse set
of signals derived from a pair of monolingual corpora into a single
discriminative model. Even in a low resource machine translation setting, where
induced translations have the potential to improve performance substantially,
it is reasonable to assume access to some amount of data to perform this kind
of optimization. Our work shows that only a few hundred translation pairs are
needed to achieve strong performance on the bilingual lexicon induction task,
and our approach yields an average relative gain in accuracy of nearly 50% over
an unsupervised baseline. Large gains in accuracy hold for all 22 languages
(low and high resource) that we investigate.
Back to Papers Accepted