[ Jump to the site navigation menu. ] [ Jump to the site search. ]


Tutorial 1: Bayesian Nonparametric Structured Models

Percy Liang and Dan Klein


Probabilistic modeling is a dominant approach for both supervised and unsupervised learning tasks in NLP. One constant challenge for models with latent variables is determining the appropriate model complexity, i.e. the question of "how many clusters." While cross-validation can be used to select between a limited number of options, it cannot be feasibly applied in the context of larger hierarchical models where we must balance complexity in many parts of the model at the same time. Nonparametric "infinite" priors such as Dirichlet processes are powerful tools from the Bayesian statistics literature which address exactly this issue. Such priors, which have seen increasing use in recent NLP work, allow the complexity of the model to adapt to the data and admit more tractable and elegant inference methods than traditional model selection approaches.

In explaining how to do inference in these new models, we try to dispel two myths: first, that Bayesian methods are too slow and cumbersome, and, second, that Bayesian techniques require a whole new set of algorithmic ideas. We depart from the traditional sampling methodology which has dominated past expositions and focus on variational inference, an efficient technique which is a natural extension of EM. This approach allows us to tackle structured models such as HMMs and PCFGs with the benefits of Bayesian nonparametrics while maintaining much of the existing EM machinery so familiar to this community. In addition to our foundational presentation, we discuss both concrete implementation issues and demonstrate the empirical advantages of these methods.

Tutorial Outline

  1. Bayesian priors
    • Properties of Dirichlet priors
    • Marginalization and sampling
    • Variational inference: from Viterbi EM to EM to variational Bayes Dirichlet processes
    • Limit of finite mixture models
    • Stick-breaking construction
    • Chinese restaurant process
    • Properties of DP: decaying cluster sizes, etc.
    • Inference: variational inference, why EM won't work, sampling
  2. Structured models
    • Latent Dirichlet allocation
    • Word alignment models
    • Hidden Markov models
    • Probabilistic context-free grammars

Slides are available online as pdf document.


Percy Liang is a Ph.D. student in computer science at UC Berkeley. He has a BS in math and a BS/MS in computer science from MIT. His research interests include probabilistic modeling for semi-supervised learning in NLP, especially using Bayesian nonparametrics, and approximate inference algorithms for such models. He holds an NSF Graduate Fellowship and a National Defense Science and Engineering Graduate Fellowship.

Dan Klein is an assistant professor of computer science at the University of California, Berkeley (PhD Stanford, MS Oxford, BA Cornell). Professor Klein's research focuses on statistical natural language processing, including unsupervised methods, syntactic parsing, and machine translation. His academic honors include a British Marshall Fellowship, an inaugural Microsoft New Faculty Fellowship, and best paper awards at the ACL, NAACL, and EMNLP conferences.


[ Jump to the content. ]

Organized by:

ACL UFAL

[ Jump to the content. ] [ Jump to the site navigation menu. ] [ Jump to the site search. ]


Webmasters: Zlatka Subrova and Juraj Simlovic. Page content: Joakim Nivre.
Site is valid XHTML 1.0 and valid CSS. Maintained with TED Notepad and Vim.
Disclaimers. All rights reserved. Access counter: 467036.