TITLE

Deep Learning for NLP (without Magic)

PRESENTERS

Richard Socher, Yoshua Bengio and Christopher D. Manning

ABSTRACT
Machine learning is everywhere in today's NLP, but by and large machine learning amounts to numerical optimization of weights for human designed representations and features. The goal of deep learning is to explore how computers can take advantage of data to develop features and representations appropriate for complex interpretation tasks. This tutorial aims to cover the basic motivation, ideas, models and learning algorithms in deep learning for natural language processing. Recently, these methods have been shown to perform very well on various NLP tasks such as language modeling, POS tagging, named entity recognition, sentiment analysis and paraphrase detection, among others. The most attractive quality of these techniques is that they can perform well without any external hand-designed resources or time-intensive feature engineering. Despite these advantages, many researchers in NLP are not familiar with these methods. Our focus is on insight and understanding, using graphical illustrations and simple, intuitive derivations. The goal of the tutorial is to make the inner workings of these techniques transparent, intuitive and their results interpretable, rather than black boxes labeled "magic here". The first part of the tutorial presents the basics of neural networks, neural word vectors, several simple models based on local windows and the math and algorithms of training via backpropagation. In this section applications include language modeling and POS tagging. In the second section we present recursive neural networks which can learn structured tree outputs as well as vector representations for phrases and sentences. We cover both equations as well as applications. We show how training can be achieved by a modified version of the backpropagation algorithm introduced before. These modifications allow the algorithm to work on tree structures. Applications include sentiment analysis and paraphrase detection. We also draw connections to recent work in semantic compositionality in vector spaces. The principle goal, again, is to make these methods appear intuitive and interpretable rather than mathematically confusing. By this point in the tutorial, the audience members should have a clear understanding of how to build a deep learning system for word-, sentence- and document-level tasks. The last part of the tutorial gives a general overview of the different applications of deep learning in NLP, including bag of words models. We will provide a discussion of NLP-oriented issues in modeling, interpretation, representational power, and optimization.
OUTLINE
  PART I: The Basics
    • Motivation
    • From logistic regression to neural networks
    • Theory: Backpropagation training
    • Applications: Word vector learning, POS, NER
    • Unsupervised pre-training, multi-task learning, and learning relations

  PART II: Recursive Neural Networks
    • Motivation
    • Definition of RNNs
    • Theory: Backpropagation through structure
    • Applications: Sentiment Analysis, Paraphrase detection, Relation Classification

  PART III: Applications and Discussion
    • Overview of various NLP applications,
    • Efficient reconstruction or prediction of high-dimensional sparse vectors
    • Discussion of future directions, advantages and limitations
PRESENTER BIOS

• Richard Socher is a PhD student at Stanford working with Chris Manning and Andrew Ng. His research interests are machine learning for NLP and vision. He is interested in techniques that learn semantic features, capture recursive structure in multiple modalities and perform well across multiple supervised tasks. Most recently he developed several recursive deep learning models for compositionality in vector spaces, parsing, sentiment analysis, paraphrasing and word relation classification. In 2011, he was awarded the Yahoo! Key Scientific Challenges Program Award, the Distinguished Application Paper Award at ICML and a Microsoft Research Fellowship.

Richard Socher, Stanford University,
353 Serra Mall Rm 228, Stanford, CA 94305-9040, USA
richard@socher.org
www.socher.org

• Yoshua Bengio is a Full Professor at the Department of Computer Science and Operations Research, head of the Machine Learning Laboratory (LISA), CIFAR Fellow in the Neural Computation and Adaptive Perception program, Canada Research Chair in Statistical Learning Algorithms, and he also holds the NSERC-Ubisoft industrial chair. His main research ambition is to understand principles of learning that yield intelligence. His research is widely cited (over 10000 citations found by Google Scholar). Yoshua Bengio is currently action editor for the Journal of Machine Learning Research, editor for Foundations and Trends in Machine Learning, member of the NIPS board, and has been associate editor for the Machine Learning Journal and the IEEE Transactions on Neural Networks.

Yoshua Bengio
Departement d'Informatique et de recherche operationnelle
Universite de Montreal, P.O. Box 6128, Centre-Ville Branch
Montreal (QC), H3C 3J7, Canada
http://www.iro.umontreal.ca/~bengioy/yoshua_en/index.html

• Christopher Manning is an Associate Professor of Computer Science and Linguistics at Stanford University (PhD, Stanford, 1995). Manning has coauthored leading textbooks on statistical approaches to NLP (Manning and Schuetze 1999) and information retrieval (Manning et al. 2008). His recent work concentrates on machine learning and natural language processing, including applications such as statistical parsing and text understanding, joint probabilistic inference, clustering, and deep learning over text and images.

Christopher D. Manning, Stanford University,
353 Serra Mall Rm 158, Stanford, CA 94305-9040, USA
manning@cs.stanford.edu
http://nlp.stanford.edu/~manning/