*** ACL2012 ***

TITLE

State-of-the-Art Kernels for Natural Language Processing

PRESENTERS

Alessandro Moschitti

ABSTRACT

In recent years, machine learning (ML) has been used more and more to solve complex tasks in different disciplines, ranging from Data Mining to Information Retrieval or Natural Language Processing (NLP). These tasks often require the processing of structured input, e.g., the ability to extract salient features from syntactic/semantic structures is critical to many NLP systems. Mapping such structured data into explicit feature vectors for ML algorithms requires large expertise, intuition and deep knowledge about the target linguistic phenomena. Kernel Methods (KM) are powerful ML tools, which can alleviate the data representation problem. They substitute feature-based similarities with similarity functions, i.e., kernels, directly defined between training/test instances, e.g., syntactic trees. Hence feature vectors are not needed any longer. Additionally, kernel engineering, i.e., the composition or adaptation of several prototype kernels, facilitates the design of effective similarities required for new tasks.

Unfortunately, at the moment, there is neither comprehensive documentation describing the engineering techniques above nor application-oriented descriptions nor clear organization and explanation of the many successful kernels in NLP. Typically, what can be found is a documentation reporting complex theories, which obscure important practical aspects; or research papers reporting KM applications, which tend to be very specific with rather diverse notation and/or background. As a direct consequence KM technology results unappealing for most NLP researchers. This is unfortunate since KM can be easily used to speed up the design of machine learning systems for NLP (especially when using syntactic/semantic structures).

The tutorial aims at addressing the problems above: firstly, it will introduce essential and simplified theory of Support Vector Machines and KM with the only aim of motivating practical procedures and current best practices for designing applications based on effective kernels. For this purpose, it will survey state-of-the-art kernels for diverse NLP applications, reconciling the different approaches with a uniform and global notation/theory. Such survey will benefit from practical expertise acquired from directly working on many natural language applications, ranging from Text Categorization to Syntactic/Semantic Parsing. Moreover, practical demonstrations using SVM-Light-TK toolkit will nicely support the application-oriented perspective of the tutorial. The latter will lead NLP researchers with heterogeneous background to the acquisition of the KM know-how, which can be used to design any target NLP application.

Finally, the tutorial will propose interesting new best practices, e.g., some recent methods for large-scale learning with structural kernels, structural lexical similarities and reverse kernel engineering.

OUTLINE

Motivations (5 min)
Kernel Machines (20 min)
   - Perceptron
   - Support Vector Machines
   - Kernel Definition (Kernel Trick)
   - Mercer's Conditions
   - Kernel Operators
   - Efficiency Issue: when can we use kernels?
Basic Kernels and their Feature Spaces (25 min)
   - Linear Kernels
   - Polynomial Kernels
   - Lexical Kernels
   - String and Word Sequence Kernels
   - Tree Kernels: Subtree, Syntactic, Partial Tree Kernels (PTK), and Smoothed PTK
NLP applications with single kernels (25 min)
   - Question Classification in Jeopardy!
   - Semantic Role Labeling (SRL): FrameNet and PropBank
   - Relation Extraction: ACE
   - Coreference Resolution
   - Opinion Mining
Coffee Break (30 min)
Applied Structural Kernels (20 min)
   - SVM-Light-TK
   - Experiments in classroom with SRL and QC
   - Inspection of the input, output and model files
NLP applications with multiple kernels (25 min)
   - Question/Answer Classification
   - Textual Entailment Recognition
   - Reranking Kernels for
   - Named Entities
   - Syntactic Parse Trees
   - Concept Segmentation and Labeling (from speech)
   - Answer Reranking
   - Hierarchy Reranking
Advanced topics (25 min)
   - Kernel Engineering
   - Kernel Combinations
   - Modeling Structural Features
   - Reverse Kernel Engineering
   - Structural Feature Extraction
   - Model Linearization
   - Fast approaches using uSVM and DAGs

PRESENTER BIOS

Alessandro Moschitti is a professor of the Computer Science and Information Engineering Department of the University of Trento. He has worked as an associate researcher for the University of Texas at Dallas, as a visiting professor for the CCLS department of Columbia University and more recently as visiting researcher of the IBM Watson research center to the Jeopardy! and deepQA project. His expertise concerns with theoretical and applied machine learning in the areas of NLP, IR and Data Mining. He has devised innovative kernels within support vector and other kernel-based machines for advanced syntactic/semantic processing, documented in about 150 articles. These have been published in the major conferences of different areas, e.g., ACL, ICML, SIGIR, CIKM and ICDM, for which he is also an active area chair and PC member. Currently, he is PI of the EC Coordinate Action, EternalS, and partner PI of the EC LiMoSINe project. He has received two IBM Faculty awards, a Google Faculty Award and other prestigious best paper awards.

Alessandro Moschitti
Department of Computer Science and Engineering,
University of Trento
Via Sommarive 5
38123 POVO (TN) - Italy
e-mail: moschitti@disi.unitn.it
http://disi.unitn.it/moschitti