The Effects of Data Size and Frequency Range on Distributional Semantic Models

Magnus Sahlgren1 and Alessandro Lenci2
1Gavagai, 2University of Pisa


Abstract

This paper investigates the effects of data size and frequency range on Distributional Semantic Models. We compare the performance of a number of representative models for several test settings over data of varying sizes, and over test items of various frequency. Our results show that neural network-based models underperform when the data is small, and that the most reliable model over data of varying sizes and frequency ranges is the inverted factorized model.