Supersense Tagging for Arabic: the MT-in-the-Middle Attack

Nathan Schneider, Behrang Mohit, Chris Dyer, Kemal Oflazer and Noah A. Smith

We consider the task of tagging Arabic nouns with WordNet supersenses. Three approaches are evaluated. The first uses an expert-crafted but limited-coverage lexicon, Arabic WordNet, and heuristics. The second uses unsupervised sequence modeling. The third and most successful approach uses machine translation to translate the Arabic into English, which is automatically tagged with English supersenses, the results of which are then projected back into Arabic. Analysis shows gains and remaining obstacles in four Wikipedia topical domains.

Back to Papers Accepted