Text Alignment for Real-Time Crowd Captioning
Iftekhar Naim, Daniel Gildea, Walter Lasecki and Jeffrey Bigham
The primary way of providing real-time captioning for deaf and hard of hearing
people is to employ expensive professional stenographers who can type as fast
as natural speaking rates. Recent work has shown that a feasible alternative is
to combine the partial captions of ordinary typists, each of whom types part of
what they hear. In this paper, we describe an improved method for combining
partial captions into a final output based on weighted A^* search and multiple
sequence alignment (MSA). In contrast to prior work, our method allows the
tradeoff between accuracy and speed to be tuned, and provides formal error
bounds. Our method outperforms the current state-of-the-art on Word Error Rate
(WER) (29.6%), BLEU Score (41.4%), and F-measure (36.9%). The end goal is for
these captions to be used by people, and so we also compare how these metrics
correlate with the judgments of 50 study participants, which may assist others
looking to make further progress on this problem.
Back to Papers Accepted