ACL HLT 2011

The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

Held at the Portland Marriott Downtown Waterfront in
Portland, Oregon, USA, June 19-24, 2011

Formal and Empirical Grammatical InferencePRESENTERS: Jeffrey Heinz, Colin de la Higuera and Menno van Zaanen

ABSTRACT:

Computational linguistics (CL) often faces learning problems: what
algorithm takes as input some finite dataset, such as (annotated)
corpora, and outputs a system that behaves ``correctly'' on some task?
Data chunking, dependency parsing, named entity recognition,
part-of-speech tagging, semantic role labelling, and the more general
problem of deciding which out of all possible (structured) strings are
grammatical are all learning problems in this sense.

The subfield of theoretical computer science known as grammatical
inference (GI) studies how grammars can be obtained from data.
Although there has been limited interaction between the fields of GI
and CL to date, research by the GI community directly impacts the
study of human language and in particular CL. The purpose of this
tutorial is to introduce GI to computational linguists.

This tutorial shows how the theories, algorithms, and models studied
by the GI community can benefit CL. Particular attention is paid to
both foundational issues (e.g. how learning ought to be defined),
which are relevant to all learning problems even those outside CL, and
issues specific to problems within CL. One theme that runs throughout
the tutorial is how properties of natural languages can be used to
reduce the instance space of the learning problem, which can lead to
provably correct learning solutions. E.g. some natural language
patterns are mildly context-sensitive (MCS), but many linguists agree
that not all MCS languages are possible natural languages. So instead
of trying to learn all MCS languages (or context-free, regular or
stochastic variants thereof), one approach is to try to define a
smaller problem space that better approximates natural languages.
Interestingly, many of the regions that the GI community has carved
out for natural language appear to have the right properties for
feasible learning even under the most stringent definitions of
learning.

OUTLINE:

Formal GI and learning theory. Those learnability properties which
should be regarded as necessary (not sufficient) conditions for
"good" language learning are discussed. Additionally, the most
important proofs that use similar algorithmic ideas will be
described.

Empirical approaches to regular and sub-regular natural language
classes. This focuses on sub-regular classes that are learnable
under many (including the hardest) settings. General learning
strategies are introduced as well as probabilistic variants, which
are illustrated with natural language examples, primarily in the
domain of phonology.

Empirical approaches to non-regular natural language classes. This
treats learnability of context-free/sensitive formalisms where the
aim is to approximate the grammar of a specific language from
data, not to show learnability of classes of languages. An
overview of well-known empirical grammatical inference systems
will be given in the context of natural language syntax and
semantics.

PRESENTER BIOS:

Jeffrey Heinz
Department of Linguistics and Cognitive Science
University of Delaware
Email: heinz@udel.edu

Jeffrey Heinz's research is located at the intersection of theoretical
linguistics, theoretical computer science, computational linguistics,
grammatical inference and cognitive science. He has authored or
co-authored articles in these areas for the journals Phonology,
Linguistic Inquiry, and the Journal of Child Language; chapters in the
forthcoming Blackwell Companion to Phonology and Cambridge Handbook of
Developmental Linguistics; and papers in the proceedings of the ACL,
CONLL, ICGI, MOL, and SIGMORPHON. His current research centers on the
hypotheses that phonology is subregular and that phonological learning
is modular. He holds a BS in Mathematics and a BA in Linguistics from
the University of Maryland, College Park, and a PhD in Linguistics
from the University of California, Los Angeles. Jeffrey Heinz is
currently an assistant professor at the University of Delaware.

Colin de la Higuera
Laboratoire LINA UMR CNRS 6241
Email: Colin.Delahiguera@univ-nantes.fr

Colin de la Higuera got his PhD at Bordeaux University, France and has
been associate Professor at the University of Montpellier, Professor
at Saint-Etienne University and is now Professor at Nantes University.
His chief interest lies in grammatical inference; in this area he is
the author of more than 50 reviewed research papers and of a
monograph, 'Grammatical Inference: Learning Automata and Grammars',
published in 2010. He has been chairman of the International
Community in Grammatical Inference (2002-2007) and is now in charge of
the curriculum development programme in the European Network PASCAL 2.
He has also given tutorials in grammar induction at ICML 1999, CAP
2001, ECML/PKDD 2003, IJCAI 2005, ICML 2006 and at different summer
schools in machine learning and computational linguistics.

Menno van Zaanen
Tilburg centre for Cognition and Communication
Department of Communication and Information Sciences
Tilburg University
Email: mvzaanen@uvt.nl

Menno van Zaanen is an assistant professor at Department of
Communication and Information Sciences at Tilburg University. He has
co-organized machine translation/grammatical inference competitions,
Tenjinno and Omphalos. Menno has been chairman of the ICGI community
from 2007 until 2010 and has co-organized the ICGI 2002 conference
held in Amsterdam in addition to workshops (HCSNet data workshop 2005,
Workshop on Grammatical Inference Applications (IJCAI 2005), ECML/PKDD
Workshop and Tutorial on Learning Context-Free Grammars 2003). His
main research aim is to apply symbolic machine learning to
computational linguistics. Based on his PhD research on learning
context-free grammars in a computational linguistic setting, he has
applied similar techniques to question answering, sentence
classification, learning structure in music, and machine translation.
His work on Alignment-Based Learning (ABL) has been adopted by several
researchers, not only in empirical learning syntax, but also in
machine translation and learning of morphology and has led to further
research in formal grammatical inference.

acl2011.conference@gmail.com ♦ Oregon Health & Science University