[PDF]
Much of NLP tries to map structured input (sentences) to some form of structured output (tag sequences, parse trees, semantic graphs, or translated/paraphrased/compressed sentences). Thus structured prediction and its learning algorithm is of central importance to us NLP researchers. However, when applying traditional machine learning to structured domains, we often face scalability issues for two reasons:
1. Exact structured inference (such as parsing and translation) is too slow for repeated use on the training data, but approximate search (such as beam search) unfortunately breaks down the nice theoretical properties (such as convergence) of existing machine learning algorithms.
2. Even with inexact search, the scale of the training data in NLP still makes pure online learning (such as perceptron and MIRA) too slow on a single CPU.
This tutorial reviews recent advances that address these two challenges. In particular, we will cover principled machine learning methods that are designed to work under vastly inexact search, and parallelization algorithms that speed up learning on multiple CPUs. We will also extend structured learning to the latent variable setting, where in many NLP applications such as translation the gold-standard derivation is hidden.
Contents:
1. Overview of Structured Learning
* key challenge 1: search efficiency
* key challenge 2: interactions b/w search and learning
2. Structured Perceptron
* the basic algorithm
* convergence proof -- a geometric approach (updated in 2015)
* voted perceptron, averaged perceptron and efficient implementation
* applications in tagging, parsing, etc.
* inseparability and generalization bounds (new in 2015)
3. Structured Perceptron under Inexact Search
* convergence theory breaks under inexact search
* early update
* violation-fixing perceptron
* applications in tagging, parsing, etc.
break
4. Large-Margin Structured Learning with Latent Variables
* examples: machine translation, semantic parsing, sequence labeling
* separability condition and convergence proof (updated in 2015)
* latent-variable perceptron under inexact search
* applications in machine translation
5. Parallelizing Large-Margin Structured Learning
* iterative parameter mixing (IPM)
* minibatch perceptron and MIRA
Instructor Bios:
Liang Huang is an Assistant Professor at the City University of New York (CUNY). He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. His work is mainly on the theoretical aspects (algorithms and formalisms) of computational linguistics, as well as theory and algorithms of structured learning. He has received a Best Paper Award at ACL 2008, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), and a University Graduate Teaching Prize at Penn (2005). He has given three tutorials at COLING 2008, NAACL 2009 and ACL 2014.
James Cross is a Ph.D. candidate at the City University of New York (CUNY) Graduate Center, working under the direction of Liang Huang in the area of natural language processing. He has undergraduate degrees in computer science and French from Auburn University, a law degree from New York University, and a master’s degree in computer science from the City College of New York. He was a summer intern at the IBM T.J. Watson Research Center in 2014 and is a summer research intern at Facebook in 2015.