DISCRIMINATIVE JOINT MODELING OF LEXICAL VARIATION AND ACOUSTIC CONFUSION FOR AUTOMATED NARRATIVE RETELLING ASSESSMENT

Maider Lehr, Izhak Shafran, Emily Prudhommeaux and Brian Roark

Automatically assessing the fidelity of a retelling to the original narrative -- a task of growing clinical importance -- is challenging, given extensive paraphrasing during retelling along with cascading automatic speech recognition (ASR) errors. We present a word tagging approach using conditional random fields (CRFs) that allows a diversity of features to be considered during inference, including some capturing acoustic confusions encoded in word confusion networks. We evaluate the approach under several scenarios, including both supervised and unsupervised training, the latter achieved by training on the output of a baseline automatic word-alignment model. We also adapt the ASR models to the domain, and evaluate the impact of error rate on performance. We find strong robustness to ASR errors, even using just the 1-best system output. A hybrid approach making use of both automatic alignment and CRFs trained tagging models achieves the best performance, yielding strong improvements over using either approach alone.

Back to Papers Accepted