Classification of South African languages using text and acoustic based methods: A case of six selected languages
Peleira Nicholas Zulu
Language variations are generally known to have a severe impact on the
performance of Human Language Technology Systems. In order to predict or
improve system performance, a thorough investigation into these variations,
similarities and dissimilarities, is required. Distance measures have been used
in several applications of speech processing to analyze different varying
speech attributes. However, not much work has been done on language distance
measures, and even less work has been done involving South African languages.
This study explores two methods for measuring the linguistic distance of six
South African languages. It concerns a text based method, (the Levenshtein
Distance), and an acoustic approach using extracted mean pitch values. The
Levenshtein distance uses parallel word transcriptions from all six languages
with as little as 144 words, whereas the pitch method is text-independent and
compares mean language pitch differences. Cluster analysis resulting from the
distance matrices from both methods correlates closely with human perceptual
distances and existing literature about the six languages.
Back to Papers Accepted