Learning a Lexicon and Translation Model from Phoneme Lattices

Oliver Adams1, Graham Neubig2, Trevor Cohn3, Steven Bird3, Quoc Truong Do4, Satoshi Nakamura5
1The University of Melbourne, 2Carnegie Mellon University, 3University of Melbourne, 4Graduate school of Information and Science, NARA Institute of Science and Technology, 5Nara Institute of Science and Technology


Abstract

Language documentation begins by gathering speech. Manual or automatic

transcription at the word level is typically not possible because of the

absence of an orthography or prior lexicon, and though manual phonemic

transcription is possible, it is very slow. On the other hand, translations of

the minority language into a major language are more easily acquired. We

propose a method to harness such translations to improve automatic phoneme

recognition. The method assumes no prior lexicon or translation model, instead

learning them from phoneme lattices and translations of the speech being

transcribed. Experiments demonstrate phoneme error rate improvements against

two baselines and the model's ability to learn useful bilingual lexical

entries.