Improved Information Structure Analysis of Scientific Documents Through Discourse and Lexical Constraints
Yufan Guo, Roi Reichart and Anna Korhonen
Inferring the information structure of scientific documents is useful for many
down-stream applications. Existing feature-based machine learning approaches to
this task require substantial training data and suffer from limited
performance. Our idea is to guide feature-based models with declarative domain
knowledge encoded as posterior distribution constraints. We explore a rich set
of discourse and lexical constraints which we incorporate through the
Generalized Expectation (GE) criterion. Our constrained model improves the
performance of existing fully and lightly supervised models. Even a fully
unsupervised version of this model outperforms lightly supervised feature-based
models, showing that our approach can be useful even when no labeled data is
available.
Back to Papers Accepted