Learning a Part-of-Speech Tagger from Two Hours of Annotation
Dan Garrette and Jason Baldridge
Most work on weakly-supervised learning for part-of-speech taggers has been
based on unrealistic assumptions about the amount and quality of training data.
For this paper, we attempt to create true low-resource scenarios by allowing a
linguist just two hours to annotate data and evaluating on the languages
Kinyarwanda and Malagasy. Given these severely limited amounts of either type
supervision (tag dictionaries) or token supervision (labeled sentences), we are
able to dramatically improve the learning of a hidden Markov model through our
method of automatically generalizing the annotations, reducing noise, and
inducing word-tag frequency information.
Back to Papers Accepted