Introduction to Classification: Likelihoods, Margins, Features, and Kernels

Statistical methods in NLP have exploited a variety of classification techniques as core building blocks for complex models and pipelines. In this tutorial, we will survey the basic techniques behind classification. We first consider the basic principles, including the principles of maximum likelihood and maximum margin. We then discuss several core classification technologies: naive Bayes, perceptrons, logistic regression, and support vector machines. The discussion will include the key optimization ideas behind their training and the empirical trade-offs between the various classifiers. Finally, we consider the extension to kernels and kernelized classification: what can kernels offer and what is their cost? The presentation is targeted to NLP researchers new to these methods or those wanting to understand more about how these techniques are interconnected.
Topics:

Basics of classification

Feature-based representations
Linear classifiers
Principles of classification: likelihood and margin
Smoothing and regularization
Structured classification

Specific techniques

Perceptrons
Naive Bayes
Logistic regression / maximum entropy
Support vector machines
Comparison and trade-offs

Kernel methods

Why kernels (and why not)?
Kernelized linear classifiers
Kernelized perceptrons
Kernelizing SVMs and logistic regression
Advanced kernels and structure