Introduction to Classification: Likelihoods, Margins,
Features, and Kernels
Introduction to Classification: Likelihoods, Margins, Features, and
Kernels
Statistical methods in NLP have exploited a variety of classification
techniques as core building blocks for complex models and pipelines.
In this tutorial, we will survey the basic techniques behind
classification. We first consider the basic principles, including the
principles of maximum likelihood and maximum margin. We then discuss
several core classification technologies: naive Bayes, perceptrons,
logistic regression, and support vector machines. The discussion will
include the key optimization ideas behind their training and the
empirical trade-offs between the various classifiers. Finally, we
consider the extension to kernels and kernelized classification: what
can kernels offer and what is their cost? The presentation is
targeted to NLP researchers new to these methods or those wanting to
understand more about how these techniques are interconnected.
Topics:
- Basics of classification
- Feature-based representations
- Linear classifiers
- Principles of classification: likelihood and margin
- Smoothing and regularization
- Structured classification
- Specific techniques
- Perceptrons
- Naive Bayes
- Logistic regression / maximum entropy
- Support vector machines
- Comparison and trade-offs
- Kernel methods
- Why kernels (and why not)?
- Kernelized linear classifiers
- Kernelized perceptrons
- Kernelizing SVMs and logistic regression
- Advanced kernels and structure