Massively Parallel Suffix Array Queries and On-Demand Phrase Extraction for Statistical Machine Translation Using GPUs
Hua He, Jimmy Lin and Adam Lopez
Translation models in statistical machine translation can be scaled to large
corpora and arbitrarily-long phrases by looking up translations of source
phrases "on the fly" in an indexed parallel corpus using suffix arrays.
However, this can be slow because on-demand extraction of phrase tables is
computationally expensive. We address this problem by developing novel
algorithms for general purpose graphics processing units (GPUs), which enable
suffix array queries for phrase lookup and phrase extraction to be massively
parallelized. Compared to a highly-optimized, state-of-the-art serial CPU-based
implementation, our techniques achieve at least an order of magnitude
improvement in terms of throughput. This work demonstrates the promise of
massively parallel architectures and the potential of GPUs for tackling
computationally-demanding problems in statistical machine translation and
language processing.
Back to Papers Accepted