|
home
>> program >>
tutorials >>
Tutorials Details
TUTORIALS DETAILS
T1: Introduction to Non-Statistical Natural Language Processing
Graeme Hirst, University of Toronto
Tuesday May 27 morning
Many problems in natural language processing are best understood and
approached by symbolic rather than statistical methods, especially
problems in using the structure and meaning of sentences, paragraphs,
texts, and dialogues. This tutorial will introduce symbolic methods
in syntactic analysis, semantic structures and the representation of
meaning, discourse and dialogue structure, and the pragmatics of
language in use.
Target audience:
No background in computational linguistics or natural language
processing will be assumed. The tutorial will be particularly
suitable for people working in speech technology and information
retrieval who want to learn about more-linguistic methods of
processing.
Tutorial Outline:
- Symbolic NLP
- Words, lexicons, morphology
- Syntax, grammars
- Parsing algorithms
- Semantic representations
- Semantic analysis
- Generating language from meaning
- Discourse structure and relations
- Language in use, implicature, presupposition
Graeme Hirst's research has covered a
broad but integrated range of topics in computational linguistics and
natural language processing, including the resolution of ambiguity in
language understanding; the preservation of author's style in machine
translation; recovering from misunderstanding and non-understanding in
human-computer communication; and linguistic constraints on
knowledge-representation systems. His present research includes the
problem of near-synonymy in lexical choice in language generation;
computer assistance for collaborative writing; and applications of
semantic distance in intelligent spelling checkers. He is a member of
the editorial boards of Machine Translation and
Computational Linguistics, and has served as book review editor
of the latter since 1985. He is the author of two monographs:
Anaphora in Natural Language Understanding and Semantic
Interpretation and the Resolution of Ambiguity.
T2: Information Retrieval Systems as Integration Platforms for Language
Technologies
Douglas Oard, University of Maryland
Tuesday May 27 morning
At one time it might have been possible to think of natural language
processing, speech processing and information retrieval as separate
fields, but an increasing degree of interdependence is now clearly
evident. This tutorial will explore those connections from the
perspective of information retrieval system design. An overarching
framework for interactive retrieval will be introduced, and then
specialized to describe Web search, cross-language retrieval, and
retrieval from spoken word collections. These applications will then
be used to illustrate the critical dependence on component
technologies such as computational morphology, acquisition of
translation knowledge from corpora, speaker identification, automatic
speech recognition, and summarization. Information retrieval systems
can offer a useful environment for extrinsic evaluation of new
component capabilities, so the tutorial will conclude with a review of
evaluation techniques that can help to reveal the contribution of
specific components. Attendees will receive a copy of the
presentation slides and recommendations for further reading on each of
the major topics of the tutorial.
This tutorial is designed for participants that bring expertise in one
or more human language technologies. No prior exposure to information
retrieval research methods is assumed.
Douglas Oard is an Associate Professor at the University of Maryland,
College Park, with a joint appointment in the College of Information
Studies and the Institute for Advanced Computer Studies, and he
is presently on sabbatical at the Information Sciences Institute of
the University of Southern California. He holds a Ph.D. in Electrical
Engineering from the University of Maryland, and his research
interests center around the use of emerging technologies to support
information seeking. Dr. Oard is well known for his work on
cross-language information retrieval, retrieval from spoken word
collections, and the use of observable behavior to characterize
information content. Additional information is available at
http://www.glue.umd.edu/~oard/.
T3: Speech Recognition and Understanding
Alex Acero, Microsoft
Tuesday May 27 morning
This tutorial will introduce the main concepts behind modern speech
recognition and understanding systems. The state-of-the-art and the
assumptions and limitations of current technology will be presented. The
emphasis will be on describing an end-to-end system and how the
different components fit together. No background in speech technology
will be assumed.
Tutorial Outline
- A system overview. A block diagram of the different building blocks in
a spoken language system will be given.
- Signal Processing. The first step is to extract features from the
input signal. Noise robustness is needed for usable systems.
- Hidden Markov Models. The basic learning and decoding algorithms will
be explained. Basic topics such as discrete and continuous HMM,
maximum-likelihood vs. discriminative training and parameter smoothing
will be covered.
- Acoustic Modeling. I'll cover design issues in acoustic modeling:
isolated vs. continuous speech, phone-based vs. word-based recognition,
context-dependent vs. context independent, speaker-dependent vs.
speaker-independent. I'll also cover adaptation techniques (MAP and
MLLR) and confidence issues.
- Language Modeling. I'll describe the use of both context-free grammars
and ngrams as language models used in speech recognizers.
- Search Algorithms for ASR. ASR systems need to evaluate millions of
hypotheses so efficient algorithms are needed.
- Speech Understanding. Semantic extraction from speech and text for
limited domains will be presented.
- Systems, Applications and User Interface. I'll describe the main
applications of the technology sprinkled with a few demos.
Alex Acero is the Manager of the Speech Research Group at Microsoft
Research. His interests lie in improving the accuracy and robustness of
speech recognition systems as well as building useful spoken language
systems. He is author of the textbook Spoken Language Processing, the
book Acoustical and Environmental Robustness in Automatic Speech
Recognition and over 70 publications.
T4: The State of the Art in Language Modeling
Joshua Goodman, Microsoft
Tuesday May 27 morning
This tutorial will cover the state-of-the-art in language modeling.
Language models give the probability of word sequences,
i.e. "recognize speech" is much more probable than "wreck a nice
beach." While most widely known for their use in speech recognition,
language models are useful in a large number of areas, including
information retrieval, machine translation, handwriting recognition,
context-sensitive spelling correction, and text entry for Chinese and
Japanese or on small input devices. Many language modeling techniques
can be applied to other areas or to modeling any discrete sequence.
This tutorial should be accessible to anyone with a basic knowledge of
probability.
The most basic language models -- n-gram models -- essentially just
count occurrences of words in training data. I will describe five
relatively simple improvements over this baseline: smoothing, caching,
skipping, sentence-mixture models, and clustering. I will talk a bit
about the applications of language modeling and then I will quickly
describe other recent promising work, and available tools and
resources.
I will begin by describing conventional-style language modeling
techniques.
- Smoothing addresses the problem of data sparsity: there is rarely
enough data to accurately estimate the parameters of a language model.
Smoothing gives a way to combine less specific, more accurate
information with more specific, but noisier data. I will describe two
classic techniques -- deleted interpolation and Katz (or Good-Turing)
smoothing -- and one recent technique, Modified Kneser-Ney smoothing,
which is the best known.
- Caching is a widely used technique that uses the observation that
recently observed words are likely to occur again. Models from
recently observed data can be combined with more general models to
improve performance.
- Skipping models use the observation that even words that are not
directly adjacent to the target word contain useful information.
- Sentence-mixture models use the observation that there are many
different kinds of sentences. By modeling each sentence type
separately, performance is improved.
- Clustering is one of the most useful language modeling techniques.
Words can be grouped together into clusters through various automatic
techniques; then the probability of a cluster can be predicted instead
of the probability of the word. Clustering can be used to make
smaller models or better performing ones. I will talk briefly about
clustering issues specific to the huge amounts of data used in
language modeling (hundreds of millions of words) to form thousands of
clusters.
I will then talk about other language modeling applications, with an
emphasis on information retrieval, but also mentioning spelling
correction, machine translation, and entering text in Chinese or
Japanese.
I will briefly describe some recent successful techniques, including
Bellegarda's work using latent semantic analysis and Wang's SuperARV
language models. Finally, I will also talk about some practical
aspects of language modeling. I will describe how freely available,
off-the-shelf tools can be used to easily build language models, where
to get data to train a language model, and how to use methods such as
count cutoffs or relative-entropy techniques to prune language models.
Those who attend the tutorial should walk away with a broad
understanding of current language modeling techniques, and the
background needed to build their own language models, and choose the
right language model techniques for their applications.
Joshua Goodman's research areas have previously included speech recognition and
statistical NLP, especially statistical parsing. He then focused on
language model research, particularly on smoothing, but later on the
other areas outlined in this tutorial. More recently, his interests
have moved to machine learning, especially maximum entropy models. In
particular, he has been applying machine learning techniques to
stopping spam.
T5: What's New in Statistical Machine Translation
Kevin Knight and Philipp Koehn, USC/ISI
Tuesday May 27 afternoon
Automatic translation from one human language to another using computers,
better known as machine translation (MT), is a long-standing goal of
computer science. Accurate translation requires a great deal of knowledge
about the usage and meaning of words, the structure of phrases, the meaning
of sentences, and which real-life situations are plausible. For
general-purpose translation, the amount of required knowledge is
staggering, and it is not clear how to prioritize knowledge acquisition
efforts.
Recently, there has been a fair amount of research into extracting
translation-relevant knowledge automatically from bilingual texts. In the
early 1990s, IBM pioneered automatic bilingual-text analysis. A 1999
workshop at Johns Hopkins University saw a re-implementation of many of the
core components of this work, aimed at attracting more researchers into the
field. Over the past years, several statistical MT projects have appeared
in North America, Europe, and Asia, and the literature is growing
substantially.
Tutorial Outline:
- Data for MT
- bilingual corpora: what's out there?
- acquisition and cleaning
- what does three million words really mean?
- MT Evaluation
- manual and automatic
- word error rate, BLEU, NIST measures
- MT Evaluation versus MT
- Core Models and Decoders
- IBM Models 1-5 and HMM models, training, decoding
- word alignment and its evaluation
- alignment templates and phrase models
- syntax-based translation and language models
- weaknesses of existing models
- maximum entropy models, training, decoding
- Specialized Models
- named entity MT
- numbers and dates
- morphology
- noun phrase MT
- Available Resources
- tools and data
Kevin Knight is a Senior Research Scientist at the USC/Information Sciences
Institute and an Research Associate Professor in the Computer Science
Department at USC. He has written a number of articles on statistical MT,
plus a widely-circulated MT workbook
(http://www.isi.edu/natural-language/mt/wkbk.rtf). Dr. Knight gave an
invited talk "Statistical Machine Translation: Where Did It Go?" at
EMNLP-1998 and another invited talk "Every Time I Fire a Statistician, I
Get a Warm Fuzzy Feeling" at AMTA-2000.
Philipp Koehn is a Ph.D. candidate in Computer Science at the University of
Southern California. He has written a number of articles on topics in
statistical machine translation, including bilingual lexicon induction from
monolingual corpora, word-level translation models, and translation with
scarce resources. He has also worked at AT&T Laboratories on
text-to-speech systems, and at WhizBang! Labs on text categorization.
T6: Annotation of Temporal and Event Expressions
James Pustejovsky, Brandeis University and Inderjeet Mani,
MITRE
Tuesday May 27 afternoon
Humans live in a dynamic world, where actions bring about consequences,
and the facts and properties associated with entities change over time.
Without a robust ability to identify events in NL data and temporally situate
them, the real 91aboutness92 of the information can be missed. In appreciation
of this need, there has recently been a renewed interest in temporal and
event-based reasoning for NLP, aimed at addressing challenges in areas
such as information extraction, question-answering, and summarization.
This tutorial will begin with an overview of theoretical work on tense,
aspect, and event structure in natural language, as well as the fundamentals
of temporal reasoning. It will then go on to discuss the annotation of
temporal and event expressions in corpora, including the TimeML specification
language and other results from the ARDA/NRRC Workshop on Temporal and
Event Recognition for Question Answering Systems (TERQAS). The tutorial
will examine how to formally distinguish events and their temporal anchoring
in documents, and will discuss algorithms for ordering events mentioned
in a document relative to each other and for computing closure over an
entire discourse of events.
Tutorial attendees can expect to learn about current methodologies and
computational resources, the outstanding problems in the area, as well
as obtain follow-up pointers to the research literature. Attendees should
be familiar with information extraction and the notion of corpus annotation.
The course should appeal to those with an interest in leveraging robust
semantic analysis for tasks like question-answering, information
extraction, and summarization.
James Pustejovsky is Professor of Computer Science at Brandeis
University where he is Director of the Laboratory for Linguistics and Com
putation.
Pustejovsky conducts research in the areas of computational linguistics,
lexical semantics, knowledge representation, bioinformatics, and informat
ion
retrieval and extraction. He was organizer and PI for the ARDA-sponsored
research workshop that created the metadata markup language TimeML. He
has participated in numerous DARPA and NSF efforts in Knowledge Extractio
n
and Natural Language Engineering, including the MUC and TIPSTER projects.
His publications include numerous books on semantics and corpus
processing.
Inderjeet Mani is a Senior Principal Scientist at the MITRE Corporation
in McLean, Virginia, and an adjunct faculty in Computational Linguistics
at Georgetown University. Mani92s research, funded by MITRE, NSF, DARPA,
and others, includes information extraction, automatic summarization, and
bioinformatics. Mani helped develop the TIMEX2 annotation scheme for representing
aspects of the meaning of temporal expressions in natural languages under
the DARPA TIDES research program. He has worked (with Georgetown University)
to develop TIMEX2-annotated corpora and taggers for different languages,
and has also (with Columbia University) investigated methods for ordering
events in news. His publications include two books on automatic summarization.
T7: NLP R&D and Commercial Deployment
Mark Wasson, Lexis Nexis
Tuesday May 27 afternoon
Over the past ten years, researchers in computational linguistics and
information retrieval have made a number of advances in document retrieval,
categorization, entity recognition and other areas. This drew the attention
of venture capitalists who provided the money needed to commercialize this
work. There have been some successes, but the landscape is littered with
failed startups and applications that didn't live up to expectations. The
value of NLP research is not based on its commercializability. But for
those who seek to commercialize their research, good research alone is not
enough.
The purpose of this tutorial will be to examine the role of NLP research
from the perspective commercial deployment. Specifically, it will focus on
issues and concerns that must be addressed to meet the needs of potential
customers for NLP technology, customers who are eager for text processing
and retrieval solutions, but who often are disappointed with what they find.
Tutorial Outline:
- Academic and commercial perspectives on NLP research
a. The role of academic research
b. The role of R&D in industry
c. Knowledge and technology transfer
- What industry wants from NLP technology
a. Information overload and the silver bullet
b. Costs, productivity, competitive advantage, profit
c. Why industry likes simple alternatives to NLP
- Evaluating NLP
a. Evalution considerations for commercial deployment
b. TREC, DUC, MUC, ETC: Pros and cons
c. What to measure, what to report
- Test data
a. Standard corpora aren't enough
b. Using representative data of sufficient scale
c. Where to get the data
- Functionality
a. What does the NLP component do
b. Turning NLP functionality into product functionality
c. Understanding the end user
- Performance and scale considerations
a. How fast is fast
b. How large is large scale
c. Why throwing hardware at it isn't a solution
- Integration
a. The production environment(s)
b. Customizing the application
c. Ongoing maintenance and support
d. What 24x7 service really means
- Selling NLP to industry
a. Know your customer
b. Know your technology
c. Know your competition
d. Showing how your technology benefits your customer
e. How good salespeople go bad
- What industry wants from NLP technology - specific R&D areas
- Why industry should value and support academic NLP research,
even that with no direct commercial value (and why we
too often don't)
- Concluding remarks
Mark Wasson has been a research scientist in computational linguistics with
LexisNexis since joining the company in 1986. He has created and deployed a
number of text processing technologies in categorization, indexing, document
retrieval enhancements, entity extraction and summarization that have been
applied to hundreds of millions of documents from more than 15,000 news,
business, legal, financial and other sources. He has led research in
information extraction (currently in development) and multidocument
information aggregation (in production). He has developed and coordinated
collaborative text processing R&D with other research teams in both academia
and industry. In recent years, his job scope has also included 3rd party
technology identification and evaluation. He has explored relevant
technologies at more than 200 3rd party academic and commercial groups.
This tutorial draws from these experiences as well as those of his
colleagues at LexisNexis.
T8: Optimization, Maxent Models, and Conditional Estimation without Magic
Christopher Manning and Dan Klein, Stanford University
Tuesday May 27 afternoon
This tutorial aims to cover the basic ideas and algorithms behind
techniques such as maximum entropy modeling, conditional estimation of
generative probabilistic models, and issues regarding the use of
models more complex than simple Naive Bayes and Hidden Markov
Models. In recent years, these sophisticated probabilistic methods
have been used with considerable success on most of the core tasks of
natural language processing, for speech language models, and for
Information Retrieval tasks such as text filtering and categorization,
but the methods and their relationships are often not well understood
by practitioners. Our focus is on insight and understanding, using
graphical illustrations rather than detailed derivations whenever
possible. The goal of the tutorial is that the inner workings of these
modeling and estimation techniques be transparent and intuitive,
rather than black boxes labeled "magic here".
The tutorial decomposes these methods into optimization problems on the
one side, and optimization methods on the other. The first hour of
the tutorial presents the basics of non-linear optimization, assuming
only knowledge of basic calculus. We begin with a discussion of
convexity and unconstrained optimization, focusing on gradient
methods. We discuss in detail both simple gradient descent and the
much more practical conjugate gradient descent. The key ideas are
presented, including a comparison/contrast with alternative methods.
Next, the case of constrained optimization is presented, highlighting
the method of Lagrange multipliers and presenting several ways of
translating the abstract ideas into a concrete optimization method.
The principle goal, again, is to make Lagrange methods appear as
intuitively natural, rather than as mathematical sleight-of-hand.
The second part of the tutorial begins with a presentation of maximum
entropy models from first principles, showing their equivalence to
exponential models (also known as loglinear models, and particular
versions of which give logistic regression, and conditional random
fields). We present many simple examples to build intuition for what
maxent models can and cannot do. Finally, we discuss how to find
parameters for maximum entropy models using the previously presented
optimization methods. We also discuss methods of smoothing, focusing on
how smoothing works differently for maxent models than for standard
relative-frequency-based distributions. By this point in the tutorial,
the audience members should have a clear understanding of how to build a
system for estimating maxent models. We conclude with a discussion of
NLP-oriented issues in modeling, including conditional estimation of
generative models, and the issues involved in choosing model structure
(such as independence, label and observation biases, and so on).
The tutorial will run 3 hours, with a break in the middle.
Participants will be assumed to know basic calculus and basic probability
theory, and to have some exposure to models such as Naive Bayes and HMMs.
Chris Manning works on systems and formalisms that can intelligently
process and produce human languages. His research concentrates on
probabilistic models of language and statistical natural language
processing, information extraction, text understanding and text
mining, constraint-based theories of grammar (HPSG and LFG) and
probabilistic extensions of them, syntactic typology, computational
lexicography (involving work in XML, XSL, and information
visualization), and other topics in computational linguistics and
machine learning.
Dan Klein's research interests include the unsupervised learning of
language (structure and grammar induction), machine learning for NLP
(conditional models and estimation, the iteraction of smoothing and
model estimation), designing efficient algorithms for NLP (A* methods
for exact inference, factoring weakly coupled models, and parser
design), and applications such as statistical parsing, and data
clustering.
T9: Automatic Speaker and Language Recognition
Doug Reynolds and Marc Zissman, MIT Lincoln Lab
Tuesday May 27 afternoon
The speech signal conveys several levels of information beyond the
words, such as information about the identity of the speaker and the
language being spoken. These other levels are often very useful for
augmenting the word transcripts to allow indexing and searching of audio
archives. In this tutorial we will provide an overview of
state-of-the-art techniques for extracting, modeling and evaluating
speaker and language information from the speech signal. We will provide
an overview of the area with some historical context and describe
different application that use speaker and language recognition
technology. For both speaker and language recognition technology, we
will discuss the theory and the practice of how these systems are
designed, trained, and evaluated, from the extraction of features from
the speech signal, to corpora used to gage performance. A summary of
expected performance in recent NIST evaluations will show expected
performance levels. We will also provide an idea of the open issues in
both areas and future research directions.
Tutorial Outline:
- Introduction/Background
- History
- Definitions
- Applications
- Speaker Recognition
- Approaches using acoustic information;
Speech features conveying speaker information
- Approaches using other levels of information;
Prosodics;
Idiolect;
Phonotactics;
Pronunciations;
- System fusion
- Computation issues
- Performance of speaker recognition techniques
- Language Recognition
- Approaches using phonotactics;
Using phone strings and statistical language models
- Approaches using acoustic information;
Speech features conveying language information
- System fusion
- Computation issues
- Performance of language recognition techniques
- Conclusions and future directions
Douglas Reynolds is a Senior Member of Technical Staff in the
Information and Systems Technology Group at MIT Lincoln Laboratory. He
has worked in the area of automatic speaker recognition since 1992 both
for government and commercial applications. His thesis work introduced
the use Gaussian Mixture Models for text-independent speaker recognition
and he has since invented and developed several widely used techniques
in the area of speaker recognition, such as robust modeling with GMMs,
application of a universal background model to text-independent
recognition tasks, the use of Bayesian adaptation to train and update
speaker models, fast scoring techniques for GMM based systems, the
development and use of a handset/channel-type detector, and several
normalization techniques based on the handset/channel-type detector.
These and other ideas have been implemented in the Lincoln speaker
recognition system which has won several annual international speaker
recognition evaluations conducted by the National Institute of Standards
and Technology(NIST). He was the team leader for the 2002 JHU Summer
Workshop SuperSID project which focused on applying high-level
information to speaker recognition tasks
(http://www.clsp.jhu.edu/ws2002/groups/supersid/). He has also worked in
the area of Language recognition, helping to apply GMM based system to
the 2003 LID evaluation.
Marc Zissman is the Associate Group Leader of the Information Systems
Technology Group at MIT Lincoln Laboratory. Marc's research has focused
on digital speech processing, including parallel computing for speech
coding and recognition, co-channel talker interference suppression,
language and dialect identification, and cochlear-implant processing for
the profoundly deaf. He has developed several different approaches
to language ID that have exhibited state of the art performance in
a sequence of annual government-sponsored evaluations. The
Parallel-Phone-Recognition-Language-Model (PPRLM) system for LID has
become one of the standard approach for automatic language recognition.
|