Representing Topics Using Images
Nikolaos Aletras and Mark Stevenson
Topics generated automatically, e.g. using LDA, are now widely used in
Computational Linguistics. Topics are normally represented as a set of
keywords, often the $n$ terms in a topic with the highest marginal
probabilities. We introduce an alternative approach in which topics are
represented using images. Candidate images for each topic are retrieved from
the web by querying a search engine using the top $n$ terms. The most suitable
image is selected from this set using a graph-based algorithm which makes use
of textual information from the metadata associated with each image and
features extracted from the images themselves. We show that the proposed
approach significantly outperforms several baselines and can provide images
that are useful to represent a topic.
Back to Papers Accepted