Fast Coupled Sequence Labeling on Heterogeneous Annotations via Context-aware Pruning

Zhenghua Li1, Jiayuan Chao1, Min Zhang2, Jiwen Yang1
1Soochow University, 2Suda


Abstract

The recently proposed coupled sequence labeling is shown to be able to effectively exploit multiple labeled data with heterogeneous annotations but suffer from severe inefficiency problem due to the large bundled tag space (Li et al., 2015). In their case study of part-of-speech (POS) tagging, Li et al. (2015) manually design context-free tag-to-tag mapping rules with a lot of effort to reduce the tag space.

This paper proposes a context-aware pruning approach that performs token-wise constraints on the tag space based on contextual evidences, making the coupled approach efficient enough to be applied to the more complex task of joint word segmentation (WS) and POS tagging for the first time. Experiments show that using the large-scale People Daily as auxiliary heterogeneous data, the coupled approach can improve F-score by 95.55-94.88=0.67\% on WS, and by 90.58-89.49=1.09\% on joint WS\&POS on Penn Chinese Treebank.