A Comparative Investigation of Morphological Language Modeling for the Languages of the European Union

Thomas Mueller,  Hinrich Schuetze,  Helmut Schmid
IMS


Abstract

We investigate a language model that combines morphological and shape features with a Kneser-Ney model and test it in a large crosslingual study of European languages. Even though the model is generic and we use the same architecture and features for all languages, the model achieves reductions in perplexity for all 21 languages represented in the Europarl corpus, ranging from 3% to 11%. We show that almost all of this perplexity reduction can be achieved by identifying suffixes by frequency.