Supervised All-Words Lexical Substitution using Delexicalized Features
György Szarvas, Chris Biemann and Iryna Gurevych
We propose a supervised lexical substitution system that does not use separate
classifiers per word and is therefore applicable to any word in the vocabulary.
Instead of learning word-specific substitution patterns, a global model for
lexical substitution is trained on delexicalized (i.e., non lexical) features,
which allows to exploit the power of supervised methods while being able to
generalize beyond target words in the training set.
This way, our approach remains technically straightforward, provides better
performance and similar coverage in comparison to unsupervised approaches.
Using features from lexical resources, as well as a variety of features
computed from large corpora (n-gram counts, distributional similarity) and a
ranking method based on the posterior probabilities obtained from a Maximum
Entropy classifier, we improve over the state of the art in the LexSub
Best-Precision metric and the Generalized Average Precision measure. Robustness
of our approach is demonstrated by evaluating it successfully on two different
datasets.
Back to Papers Accepted