Cross-Lingual Semantic Similarity of Words as the Similarity of Their Semantic Word Responses
Ivan Vulić and Marie-Francine Moens
We propose a new approach to identifying semantically similar words across
languages. The approach is based on an idea that two words in different
languages are similar if they are likely to generate similar words (which
includes both source and target language words) as their top semantic word
responses. Semantic word responding is a concept from cognitive science which
addresses detecting most likely words that humans output as free word
associations given some cue word. The method consists of two main steps: (1) it
utilizes a probabilistic multilingual topic model trained on comparable data to
learn and quantify the semantic word responses, (2) it provides ranked lists of
similar words according to the similarity of their semantic word response
vectors. We evaluate our approach in the task of bilingual lexicon extraction
(BLE) for a variety of language pairs. We show that in the cross-lingual
settings without any language pair dependent knowledge the response-based
method of similarity is more robust and outperforms current state-of-the art
methods that directly operate in the semantic space of latent cross-lingual
concepts/topics.
Back to Papers Accepted