Reasoning about Pragmatics with Neural Listeners and Speakers

Jacob Andreas1 and Dan Klein2
1Berkeley, 2UC Berkeley


Abstract

We present a model for contrastively describing scenes, in which contrastive behavior results from a combination of inference-driven pragmatics and learned semantics. Like previous learned approaches to language generation, our model uses a simple feature-driven architecture (here a pair of neural ``listener'' and ``speaker'' models) to ground language in the world. Like inference-driven approaches to pragmatics, our model actively reasons about listener behavior when selecting utterances. For training, our approach requires only ordinary captions, annotated without demonstration of the pragmatic behavior the model ultimately exhibits. In human evaluations on a referring expression game, our approach succeeds 81% of the time, compared to 69% using existing techniques.