Processing Spontaneous Orthography
Ramy Eskander, Nizar Habash, Owen Rambow and Nadi Tomeh
In cases in which there is no standard orthography for a language or language
variant, written texts will display a variety of orthographic choices. This is
problematic for natural language processing (NLP) because it creates spurious
data sparseness. We study the transformation of spontaneously spelled Egyptian
Arabic into a conventionalized orthography which we have previously proposed
for NLP purposes. We show that a two-stage process can reduce divergences from
this standard by 69%, making subsequent processing of Egyptian Arabic easier.
Back to Papers Accepted