Grouping Language Model Boundary Words to Speed K–Best Extraction from Hypergraphs

Kenneth Heafield, Philipp Koehn and Alon Lavie

We propose a new algorithm to approximately extract top-scoring hypotheses from a hypergraph when the score includes an N-gram language model. In the popular cube pruning algorithm, every hypothesis is annotated with boundary words and permitted to recombine only if all boundary words are equal. However, many hypotheses share some, but not all, boundary words. We use these common boundary words to group hypotheses and do so recursively, resulting in a tree of hypotheses. This tree forms the basis for our new search algorithm that iteratively refines groups of boundary words on demand. Machine translation experiments show our algorithm makes translation 1.50 to 3.51 times as fast as with cube pruning in common cases.

Back to Papers Accepted