|
|
The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Held at the Portland
Marriott Downtown Waterfront in
Portland, Oregon, USA, June 19-24, 2011
|
Rich Prior Knowledge in Learning for Natural Language ProcessingPRESENTERS: Gregory Druck, Kuzman Ganchev, Joao Graca
ABSTRACT:
We possess a wealth of prior knowledge about most prediction problems,
and particularly so for many of the fundamental tasks in natural
language processing. Unfortunately, it is often difficult to make
use of this type of information during learning, as it typically does
not come in the form of labeled examples, may be difficult to encode
as a prior on parameters in a Bayesian setting, and may be impossible
to incorporate into a tractable model. Instead, we usually have prior
knowledge about the values of output variables. For example, linguistic
knowledge or an out-of-domain parser may provide the locations of
likely syntactic dependencies for grammar induction. Motivated by
the prospect of being able to naturally leverage such knowledge, four
different groups have recently developed similar, general frameworks
for expressing and learning with side information about output variables.
These frameworks are Constraint-Driven Learning (UIUC), Posterior
Regularization (UPenn), Generalized Expectation Criteria (UMass Amherst),
and Learning from Measurements (UC Berkley).
This tutorial describes how to encode side information about output
variables, and how to leverage this encoding and an unannotated
corpus during learning. We survey the different frameworks, explaining
how they are connected and the trade-offs between them. We also survey
several applications that have been explored in the literature,
including applications to grammar and part-of-speech induction, word
alignment, information extraction, text classification, and multi-view
learning. Prior knowledge used in these applications ranges from
structural information that cannot be efficiently encoded in the model,
to knowledge about the approximate expectations of some features, to
knowledge of some incomplete and noisy labellings. These applications
also address several different problem settings, including unsupervised,
lightly supervised, and semi-supervised learning, and utilize both
generative and discriminative models. The diversity of tasks, types of
prior knowledge, and problem settings explored demonstrate the generality
of these approaches, and suggest that they will become an important tool
for researchers in natural language processing.
The tutorial will provide the audience with the theoretical background to
understand why these methods have been so effective, as well as practical
guidance on how to apply them. Specifically, we discuss issues that come
up in implementation, and describe a toolkit that provides "out-of-the-box"
support for the applications described in the tutorial, and is extensible
to other applications and new types of prior knowledge.
OUTLINE:
Introduction (30 minutes):
- Introduction to different types of prior knowledge about NLP problems
- Limitations of previous methods for incorporating prior knowledge,
including Bayesian and heuristic approaches
- Motivation for constraining the output variables directly
- Examples and demonstrations of the potential of this approach
Recent Frameworks for Learning with Prior Knowledge (45 minutes):
- Brief theoretical overview of and discussion of connections between:
- Learning from Measurements (University of California, Berkeley)
- Generalized Expectation (University of Massachusetts, Amherst)
- Posterior Regularization (University of Pennsylvania)
- Constraint Driven Learning (University of Illinois, Urbana-Champaign)
Coffee Break (15 minutes)
Applications (65 minutes):
- Unstructured problems:
- Document Classification: labeled features, multi-view learning
- Sequence problems:
- Information Extraction: labeled features, multi-view learning, long-range
dependencies
- Word Alignment: bijectivity, symmetry
- POS Tagging: posterior sparsity
- Tree problems:
- Dependency Parsing: linguistic knowledge, noisy labels, posterior sparsity
Implementation (20 minutes):
- Guidance on implementation
- Description and walk-through of existing software packages
Closing Remarks and Discussion (5 minutes)
PRESENTER BIOS:
Joao Graca
joao.graca@l2f.inesc-id.pt
http://www.cis.upenn.edu/~graca/
Joao Graca is a post doctoral researcher at the University of Pennsylvania.
He obtained his PhD in Computer Science Engineering at Instituto Superior
Tecnico, Technical University of Lisbon, where he was advised jointly by
Luisa Coheur, Fernando Pereira and Ben Taskar. His main research interest
are Machine Learning and Natural Language Processing. Currently his research
focus on unsupervised learning with high level supervision in the form of
domain specific prior knowledge, and on the utility of unsupervised methods
for real world applications.
Gregory Druck
gdruck@cs.umass.edu
http://www.cs.umass.edu/~gdruck/
Gregory Druck is a final year PhD student in Computer Science at the
University of Massachusetts Amherst, advised by Andrew McCallum. His
research interests include semi-supervised and active machine learning
for natural language processing and information extraction. His
dissertation focuses on leveraging prior knowledge to reduce annotation
effort.
Kuzman Ganchev
kuzman@google.com
http://www.seas.upenn.edu/~kuzman/
Kuzman Ganchev is research scientist at Google Inc. He obtained his PhD
in Computer and Information Science at the University of Pennsylvania,
where he was jointly advised by Fernando Pereira and Ben Taskar. His research
interests are in machine learning applied to natural language processing,
and in particular to the use of partial supervision to guide learning. He
has worked on problems in biomedical information extraction, machine
translation, unsupervised and supervised dependency parsing, semi-supervised
learning for NLP and computational finance.
| |