Parser lexicalisation through self-learning
Marek Rei and Ted Briscoe
We describe a new self-learning framework for parser lexicalisation that
requires only a plain-text corpus of in-domain text. The method first creates
augmented versions of dependency graphs by applying a series of modifications
designed to directly capture higher-order lexical path dependencies. Scores are
assigned to each edge in the graph using statistics from an automatically
parsed background corpus. As bilexical dependencies are sparse, a novel
directed distributional word similarity measure is used to smooth edge score
estimates. Edge scores are then combined into graph scores and used for
reranking the top-$n$ analyses found by the unlexicalised parser. The approach
achieves significant improvements on WSJ and biomedical text over the
unlexicalised baseline parser, which is originally trained on a subset of the
Brown corpus.
Back to Papers Accepted