Knowledge-Rich Morphological Priors for Bayesian Language Models
Victor Chahuneau, Noah A. Smith and Chris Dyer
We present a morphology-aware nonparametric Bayesian model of language whose
prior distribution uses manually constructed finite-state transducers to
capture the word formation processes of particular languages. This relaxes the
word independence assumption and enables sharing of statistical strength
across, for example, stems or inflectional paradigms in different contexts. Our
model can be used in virtually any generative model of text where multinomial
distributions over words would be used. We obtain state-of-the-art results in
language modeling, word alignment, and unsupervised morphological
disambiguation in a variety of morphologically rich languages.
Back to Papers Accepted