Introduction to Classification: Likelihoods, Margins, Features, and Kernels

Introduction to Classification: Likelihoods, Margins, Features, and Kernels


Statistical methods in NLP have exploited a variety of classification techniques as core building blocks for complex models and pipelines. In this tutorial, we will survey the basic techniques behind classification. We first consider the basic principles, including the principles of maximum likelihood and maximum margin. We then discuss several core classification technologies: naive Bayes, perceptrons, logistic regression, and support vector machines. The discussion will include the key optimization ideas behind their training and the empirical trade-offs between the various classifiers. Finally, we consider the extension to kernels and kernelized classification: what can kernels offer and what is their cost? The presentation is targeted to NLP researchers new to these methods or those wanting to understand more about how these techniques are interconnected.
Topics:
  1. Basics of classification
    1. Feature-based representations
    2. Linear classifiers
    3. Principles of classification: likelihood and margin
    4. Smoothing and regularization
    5. Structured classification
  2. Specific techniques
    1. Perceptrons
    2. Naive Bayes
    3. Logistic regression / maximum entropy
    4. Support vector machines
    5. Comparison and trade-offs
  3. Kernel methods
    1. Why kernels (and why not)?
    2. Kernelized linear classifiers
    3. Kernelized perceptrons
    4. Kernelizing SVMs and logistic regression
    5. Advanced kernels and structure