The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Held at the Portland Marriott Downtown Waterfront in
Portland, Oregon, USA, June 19-24, 2011


Rich Prior Knowledge in Learning for Natural Language Processing

PRESENTERS: Gregory Druck, Kuzman Ganchev, Joao Graca ABSTRACT: We possess a wealth of prior knowledge about most prediction problems, and particularly so for many of the fundamental tasks in natural language processing. Unfortunately, it is often difficult to make use of this type of information during learning, as it typically does not come in the form of labeled examples, may be difficult to encode as a prior on parameters in a Bayesian setting, and may be impossible to incorporate into a tractable model. Instead, we usually have prior knowledge about the values of output variables. For example, linguistic knowledge or an out-of-domain parser may provide the locations of likely syntactic dependencies for grammar induction. Motivated by the prospect of being able to naturally leverage such knowledge, four different groups have recently developed similar, general frameworks for expressing and learning with side information about output variables. These frameworks are Constraint-Driven Learning (UIUC), Posterior Regularization (UPenn), Generalized Expectation Criteria (UMass Amherst), and Learning from Measurements (UC Berkley). This tutorial describes how to encode side information about output variables, and how to leverage this encoding and an unannotated corpus during learning. We survey the different frameworks, explaining how they are connected and the trade-offs between them. We also survey several applications that have been explored in the literature, including applications to grammar and part-of-speech induction, word alignment, information extraction, text classification, and multi-view learning. Prior knowledge used in these applications ranges from structural information that cannot be efficiently encoded in the model, to knowledge about the approximate expectations of some features, to knowledge of some incomplete and noisy labellings. These applications also address several different problem settings, including unsupervised, lightly supervised, and semi-supervised learning, and utilize both generative and discriminative models. The diversity of tasks, types of prior knowledge, and problem settings explored demonstrate the generality of these approaches, and suggest that they will become an important tool for researchers in natural language processing. The tutorial will provide the audience with the theoretical background to understand why these methods have been so effective, as well as practical guidance on how to apply them. Specifically, we discuss issues that come up in implementation, and describe a toolkit that provides "out-of-the-box" support for the applications described in the tutorial, and is extensible to other applications and new types of prior knowledge. OUTLINE: Introduction (30 minutes): - Introduction to different types of prior knowledge about NLP problems - Limitations of previous methods for incorporating prior knowledge, including Bayesian and heuristic approaches - Motivation for constraining the output variables directly - Examples and demonstrations of the potential of this approach Recent Frameworks for Learning with Prior Knowledge (45 minutes): - Brief theoretical overview of and discussion of connections between: - Learning from Measurements (University of California, Berkeley) - Generalized Expectation (University of Massachusetts, Amherst) - Posterior Regularization (University of Pennsylvania) - Constraint Driven Learning (University of Illinois, Urbana-Champaign) Coffee Break (15 minutes) Applications (65 minutes): - Unstructured problems: - Document Classification: labeled features, multi-view learning - Sequence problems: - Information Extraction: labeled features, multi-view learning, long-range dependencies - Word Alignment: bijectivity, symmetry - POS Tagging: posterior sparsity - Tree problems: - Dependency Parsing: linguistic knowledge, noisy labels, posterior sparsity Implementation (20 minutes): - Guidance on implementation - Description and walk-through of existing software packages Closing Remarks and Discussion (5 minutes) PRESENTER BIOS: Joao Graca joao.graca@l2f.inesc-id.pt http://www.cis.upenn.edu/~graca/ Joao Graca is a post doctoral researcher at the University of Pennsylvania. He obtained his PhD in Computer Science Engineering at Instituto Superior Tecnico, Technical University of Lisbon, where he was advised jointly by Luisa Coheur, Fernando Pereira and Ben Taskar. His main research interest are Machine Learning and Natural Language Processing. Currently his research focus on unsupervised learning with high level supervision in the form of domain specific prior knowledge, and on the utility of unsupervised methods for real world applications. Gregory Druck gdruck@cs.umass.edu http://www.cs.umass.edu/~gdruck/ Gregory Druck is a final year PhD student in Computer Science at the University of Massachusetts Amherst, advised by Andrew McCallum. His research interests include semi-supervised and active machine learning for natural language processing and information extraction. His dissertation focuses on leveraging prior knowledge to reduce annotation effort. Kuzman Ganchev kuzman@google.com http://www.seas.upenn.edu/~kuzman/ Kuzman Ganchev is research scientist at Google Inc. He obtained his PhD in Computer and Information Science at the University of Pennsylvania, where he was jointly advised by Fernando Pereira and Ben Taskar. His research interests are in machine learning applied to natural language processing, and in particular to the use of partial supervision to guide learning. He has worked on problems in biomedical information extraction, machine translation, unsupervised and supervised dependency parsing, semi-supervised learning for NLP and computational finance.



acl2011.conference@gmail.com   ♦   Oregon Health & Science University