Daily Program | ACL 2013

Features automatically extracted from images constitute a new and rich source of semantic knowledge that can complement information extracted from text. The convergence between vision- and text-based information can be exploited in scenarios where the two modalities must be combined to solve a target task (e.g., generating verbal descriptions of images, or finding the right images to illustrate a story). However, the potential applications for integrated visual features go beyond mixed-media scenarios: Because of their complementary nature with respect to language, visual features might provide perceptually grounded semantic information that can be exploited in purely linguistic domains.

The tutorial will first introduce basic techniques to encode image contents in terms of low-level features, such as the widely adopted SIFT descriptors. We will then show how these low-level descriptors are used to induce more abstract features, focusing on the well-established bags-of-visual-words method to represent images, but also briefly introducing more recent developments, that include capturing spatial information with pyramid representations, soft visual word clustering via Fisher encoding and attribute-based image representation. Next, we will discuss some example applications, and we will conclude with a brief practical illustration of visual feature extraction using a software package we developed.

The Association for Computational Linguistics

Local organizer:

Visual Features for Linguists: Basic image analysis techniques for multimodally- curious NLPers

Platinum Level Sponsor

Gold Level Sponsors

Silver Level Sponsors

Bronze Level Sponsors

Student Research Workshop

Supporter

Best Student Paper Award

Student Volunteer

Conference Bag Sponsor

Conference Dinner Entertainment Sponsor

Local Organizer