Using Out-of-Domain Data for Lexical Addressee Detection in Human-Human-Computer Dialog
Heeyoung Lee, Andreas Stolcke and Elizabeth Shriberg
Addressee detection (AD) is an important problem for dialog systems in
human-human-computer scenarios (contexts involving multiple people and a
system) because system-directed speech must be distinguished from
human-directed speech. Recent work on AD (Shriberg et al., 2012) showed good
results using prosodic and lexical features trained on in-domain data.
In-domain data, however, is expensive to collect for each new domain. In this
study we focus on lexical models and investigate how well out-of-domain data
(either outside the domain, or from single-user scenarios) can fill in for
matched in-domain data. We find that human-addressed speech can be modeled
using out-of-domain conversational speech transcripts, and that human-computer
utterances can be modeled using single-user data: the resulting AD system
outperforms a system trained only on matched in-domain data. Further gains (up
to a 4% reduction in equal error rate) are obtained when in-domain and
out-of-domain models are interpolated. Finally, we examine which parts of an
utterance are most useful. We find that the first 1.5 seconds of an utterance
contain most of the lexical information for AD, and analyze which lexical items
convey this. Overall, we conclude that the H-H-C scenario can be approximated
by combining data from H-C and H-H scenarios only.
Back to Papers Accepted