The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Held at the Portland Marriott Downtown Waterfront in
Portland, Oregon, USA, June 19-24, 2011


Beyond Structured Prediction: Inverse Reinforcement Learning

PRESENTER: Hal Daume III ABSTRACT: Machine learning is all about making predictions; language is full of complex rich structure. Structured prediction marries these two. However, structured prediction isn't always enough: sometimes the world throws even more complex data at us, and we need reinforcement learning techniques. This tutorial is all about the *how* and the *why* of structured prediction and inverse reinforcement learning (aka inverse optimal control): participants should walk away comfortable that they could implement many structured prediction and IRL algorithms, and have a sense of which ones might work for which problems. The first half of the tutorial will cover the "basics" of structured prediction the structured perceptron and Magerman's incremental parsing algorithm. It will then build up to more advanced algorithms that are shockingly reminiscent of these simple approaches: maximum margin techniques and search-based structured prediction. The second half of the tutorial will ask the question: what happens when our standard assumptions about our data are violated? This is what leads us into the world of reinforcement learning (the basics of which we'll cover) and then to inverse reinforcement learning and inverse optimal control. Throughout the tutorial, we will see examples ranging from simple (part of speech tagging, named entity recognition, etc.) through complex (parsing, machine translation). The tutorial does not assume attendees know anything about structured prediction or reinforcement learning (though it will hopefully be interesting even to those who know some!), but *does* assume some knowledge of simple machine learning (eg., binary classification). OUTLINE: PART I (90 minutes): (10 mins) What is structure? What is structured prediction? (20 mins) Refresher on binary classification - What does it mean to learn? - Linear models for classification - Batch versus stochastic optimization (30 mins) From perceptron to structured perceptron - Linear models for structured prediction - The "argmax" problem - From perceptron to margins (30 mins) Search-based structured prediction - Stacking - Training classifiers to make parsing decisions - Searn and generalizations PART II (90 minutes): (25 mins) Refersher on reinforcement learning - Markov decision processes - Q learning (30 mins) Inverse optimal control and A* search - Maximum margin planning - Learning to search - Dagger (25 mins) Apprenticeship learning (10 mins) Open problems INSTRUCTOR: Hal Daume III Department of Computer Science University of Maryland A.V. Williams Building #3227 College Park, MD 20742 me@hal3.name Hal Daume III is an Assistant Professor in Computer Science at the University of Maryland, with a joint appointment in Linguistics. He is primarily interested in the interface between natural language processing, computational linguistics and machine learning. His work in statistical modeling spans multiple aspects of language processing, including structured prediction, Bayesian methods, domain adaptation, and linguistic typology. He associates himself most with conferences like ACL, ICML, NIPS and EMNLP. He earned his PhD at the University of Southern Californian with a thesis on structured prediction for language (his advisor was Daniel Marcu). He spent the summer of 2003 working with Eric Brill in the machine learning and applied statistics group at Microsoft Research. Prior to that, he studied math (mostly logic) at Carnegie Mellon University. PREVIOUS PRESENTATIONS A previous version of this tutorial was given at ACL 2010 in Stockholm, Sweden. The attendance was over 100, making it by far the largest tutorial that year. The first half of these two tutorials is 90\% the same; the second half is only 50\% the same. The current version will differ in several ways from the version that was presented at ACL. Several new techniques (such as "Dagger") will be discussed that make the SP/IRL connection even more obvious, and some new work on unsupervised structured prediction will be added.



acl2011.conference@gmail.com   ♦   Oregon Health & Science University