LAMB: A Good Shepherd of Morphologically Rich Languages

Sebastian Ebert1, Thomas Müller2, Hinrich Schütze1
1Center for Information and Language Processing, University of Munich, 2CIS, University of Munich


Abstract

This paper introduces STEM and LAMB, embeddings trained for stems and lemmata instead of for surface forms. For morphologically rich languages, they perform significantly better than standard embeddings on word similarity and polarity evaluations. On a new WordNet-based evaluation, STEM and LAMB are up to 50% better than standard embeddings. We show that both embeddings have high quality even for small dimensionality and training corpora.