Segmentation Strategies for Streaming Speech Translation
Vivek Kumar Rangarajan Sridhar, John Chen, Srinivas Bangalore, Andrej Ljolje and Rathinavelu Chengalvarayan
The study presented in this work is a first effort at real-time speech
translation of TED talks, a compendium of public talks with different speakers
addressing a variety of topics. We address the goal of achieving a system that
balances translation accuracy and latency. In order to improve ASR performance
for our diverse data set, adaptation techniques such as constrained model
adaptation and vocal tract length normalization are found to be useful. In
order to improve machine translation (MT) performance, techniques that could be
employed in real-time such as monotonic and partial translation retention are
found to be of use. We also experiment with inserting text segmenters of
various types between ASR and MT in a series of real-time translation
experiments. Among other results, our experiments demonstrate that a good
segmentation is useful, and a novel conjunction-based segmentation strategy
improves translation quality nearly as much as other strategies such as
comma-based segmentation. It was also found to be important to synchronize
various pipeline components in order to minimize latency.
Back to Papers Accepted