HLT-NAACL 2003: Workshops

[ Conference Organizers ]
[ Reviewers ]
   [ Program Committee ]
   [ Full Papers Reviewers ]
   [ Short Papers Reviewers ]

[ About Edmonton ]
[ Travel ]
[ Hotel Accommodations ]
[ Conference Registration ]

[ Full Papers ]
[ Short Papers / Posters ]
[ Demonstrations ]
[ Tutorials ]
[ Workshops ]
[ Submission Format ]
[ Camera Ready Format ]
[ Online submissions ]

home >> program >> tutorials >> Tutorials Details

TUTORIALS DETAILS

T1: Introduction to Non-Statistical Natural Language Processing
Graeme Hirst, University of Toronto
Tuesday May 27 morning

Many problems in natural language processing are best understood and approached by symbolic rather than statistical methods, especially problems in using the structure and meaning of sentences, paragraphs, texts, and dialogues. This tutorial will introduce symbolic methods in syntactic analysis, semantic structures and the representation of meaning, discourse and dialogue structure, and the pragmatics of language in use.

Target audience: No background in computational linguistics or natural language processing will be assumed. The tutorial will be particularly suitable for people working in speech technology and information retrieval who want to learn about more-linguistic methods of processing.

Tutorial Outline:

Symbolic NLP
Words, lexicons, morphology
Syntax, grammars
Parsing algorithms
Semantic representations
Semantic analysis
Generating language from meaning
Discourse structure and relations
Language in use, implicature, presupposition

Graeme Hirst's research has covered a broad but integrated range of topics in computational linguistics and natural language processing, including the resolution of ambiguity in language understanding; the preservation of author's style in machine translation; recovering from misunderstanding and non-understanding in human-computer communication; and linguistic constraints on knowledge-representation systems. His present research includes the problem of near-synonymy in lexical choice in language generation; computer assistance for collaborative writing; and applications of semantic distance in intelligent spelling checkers. He is a member of the editorial boards of Machine Translation and Computational Linguistics, and has served as book review editor of the latter since 1985. He is the author of two monographs: Anaphora in Natural Language Understanding and Semantic Interpretation and the Resolution of Ambiguity.

T2: Information Retrieval Systems as Integration Platforms for Language Technologies
Douglas Oard, University of Maryland
Tuesday May 27 morning

At one time it might have been possible to think of natural language processing, speech processing and information retrieval as separate fields, but an increasing degree of interdependence is now clearly evident. This tutorial will explore those connections from the perspective of information retrieval system design. An overarching framework for interactive retrieval will be introduced, and then specialized to describe Web search, cross-language retrieval, and retrieval from spoken word collections. These applications will then be used to illustrate the critical dependence on component technologies such as computational morphology, acquisition of translation knowledge from corpora, speaker identification, automatic speech recognition, and summarization. Information retrieval systems can offer a useful environment for extrinsic evaluation of new component capabilities, so the tutorial will conclude with a review of evaluation techniques that can help to reveal the contribution of specific components. Attendees will receive a copy of the presentation slides and recommendations for further reading on each of the major topics of the tutorial.

This tutorial is designed for participants that bring expertise in one or more human language technologies. No prior exposure to information retrieval research methods is assumed.

Douglas Oard is an Associate Professor at the University of Maryland, College Park, with a joint appointment in the College of Information Studies and the Institute for Advanced Computer Studies, and he is presently on sabbatical at the Information Sciences Institute of the University of Southern California. He holds a Ph.D. in Electrical Engineering from the University of Maryland, and his research interests center around the use of emerging technologies to support information seeking. Dr. Oard is well known for his work on cross-language information retrieval, retrieval from spoken word collections, and the use of observable behavior to characterize information content. Additional information is available at http://www.glue.umd.edu/~oard/.

T3: Speech Recognition and Understanding
Alex Acero, Microsoft
Tuesday May 27 morning

This tutorial will introduce the main concepts behind modern speech recognition and understanding systems. The state-of-the-art and the assumptions and limitations of current technology will be presented. The emphasis will be on describing an end-to-end system and how the different components fit together. No background in speech technology will be assumed.

Tutorial Outline

A system overview. A block diagram of the different building blocks in a spoken language system will be given.
Signal Processing. The first step is to extract features from the input signal. Noise robustness is needed for usable systems.
Hidden Markov Models. The basic learning and decoding algorithms will be explained. Basic topics such as discrete and continuous HMM, maximum-likelihood vs. discriminative training and parameter smoothing will be covered.
Acoustic Modeling. I'll cover design issues in acoustic modeling: isolated vs. continuous speech, phone-based vs. word-based recognition, context-dependent vs. context independent, speaker-dependent vs. speaker-independent. I'll also cover adaptation techniques (MAP and MLLR) and confidence issues.
Language Modeling. I'll describe the use of both context-free grammars and ngrams as language models used in speech recognizers.
Search Algorithms for ASR. ASR systems need to evaluate millions of hypotheses so efficient algorithms are needed.
Speech Understanding. Semantic extraction from speech and text for limited domains will be presented.
Systems, Applications and User Interface. I'll describe the main applications of the technology sprinkled with a few demos.

Alex Acero is the Manager of the Speech Research Group at Microsoft Research. His interests lie in improving the accuracy and robustness of speech recognition systems as well as building useful spoken language systems. He is author of the textbook Spoken Language Processing, the book Acoustical and Environmental Robustness in Automatic Speech Recognition and over 70 publications.

T4: The State of the Art in Language Modeling
Joshua Goodman, Microsoft
Tuesday May 27 morning

This tutorial will cover the state-of-the-art in language modeling. Language models give the probability of word sequences, i.e. "recognize speech" is much more probable than "wreck a nice beach." While most widely known for their use in speech recognition, language models are useful in a large number of areas, including information retrieval, machine translation, handwriting recognition, context-sensitive spelling correction, and text entry for Chinese and Japanese or on small input devices. Many language modeling techniques can be applied to other areas or to modeling any discrete sequence. This tutorial should be accessible to anyone with a basic knowledge of probability.

The most basic language models -- n-gram models -- essentially just count occurrences of words in training data. I will describe five relatively simple improvements over this baseline: smoothing, caching, skipping, sentence-mixture models, and clustering. I will talk a bit about the applications of language modeling and then I will quickly describe other recent promising work, and available tools and resources.

I will begin by describing conventional-style language modeling techniques.

Smoothing addresses the problem of data sparsity: there is rarely enough data to accurately estimate the parameters of a language model. Smoothing gives a way to combine less specific, more accurate information with more specific, but noisier data. I will describe two classic techniques -- deleted interpolation and Katz (or Good-Turing) smoothing -- and one recent technique, Modified Kneser-Ney smoothing, which is the best known.
Caching is a widely used technique that uses the observation that recently observed words are likely to occur again. Models from recently observed data can be combined with more general models to improve performance.
Skipping models use the observation that even words that are not directly adjacent to the target word contain useful information.
Sentence-mixture models use the observation that there are many different kinds of sentences. By modeling each sentence type separately, performance is improved.
Clustering is one of the most useful language modeling techniques. Words can be grouped together into clusters through various automatic techniques; then the probability of a cluster can be predicted instead of the probability of the word. Clustering can be used to make smaller models or better performing ones. I will talk briefly about clustering issues specific to the huge amounts of data used in language modeling (hundreds of millions of words) to form thousands of clusters.

I will then talk about other language modeling applications, with an emphasis on information retrieval, but also mentioning spelling correction, machine translation, and entering text in Chinese or Japanese.

I will briefly describe some recent successful techniques, including Bellegarda's work using latent semantic analysis and Wang's SuperARV language models. Finally, I will also talk about some practical aspects of language modeling. I will describe how freely available, off-the-shelf tools can be used to easily build language models, where to get data to train a language model, and how to use methods such as count cutoffs or relative-entropy techniques to prune language models.

Those who attend the tutorial should walk away with a broad understanding of current language modeling techniques, and the background needed to build their own language models, and choose the right language model techniques for their applications.

Joshua Goodman's research areas have previously included speech recognition and statistical NLP, especially statistical parsing. He then focused on language model research, particularly on smoothing, but later on the other areas outlined in this tutorial. More recently, his interests have moved to machine learning, especially maximum entropy models. In particular, he has been applying machine learning techniques to stopping spam.

T5: What's New in Statistical Machine Translation
Kevin Knight and Philipp Koehn, USC/ISI
Tuesday May 27 afternoon

Automatic translation from one human language to another using computers, better known as machine translation (MT), is a long-standing goal of computer science. Accurate translation requires a great deal of knowledge about the usage and meaning of words, the structure of phrases, the meaning of sentences, and which real-life situations are plausible. For general-purpose translation, the amount of required knowledge is staggering, and it is not clear how to prioritize knowledge acquisition efforts.

Recently, there has been a fair amount of research into extracting translation-relevant knowledge automatically from bilingual texts. In the early 1990s, IBM pioneered automatic bilingual-text analysis. A 1999 workshop at Johns Hopkins University saw a re-implementation of many of the core components of this work, aimed at attracting more researchers into the field. Over the past years, several statistical MT projects have appeared in North America, Europe, and Asia, and the literature is growing substantially.

Tutorial Outline:

Data for MT - bilingual corpora: what's out there? - acquisition and cleaning - what does three million words really mean?
MT Evaluation - manual and automatic - word error rate, BLEU, NIST measures - MT Evaluation versus MT
Core Models and Decoders - IBM Models 1-5 and HMM models, training, decoding - word alignment and its evaluation - alignment templates and phrase models - syntax-based translation and language models - weaknesses of existing models - maximum entropy models, training, decoding
Specialized Models - named entity MT - numbers and dates - morphology - noun phrase MT
Available Resources - tools and data

Kevin Knight is a Senior Research Scientist at the USC/Information Sciences Institute and an Research Associate Professor in the Computer Science Department at USC. He has written a number of articles on statistical MT, plus a widely-circulated MT workbook (http://www.isi.edu/natural-language/mt/wkbk.rtf). Dr. Knight gave an invited talk "Statistical Machine Translation: Where Did It Go?" at EMNLP-1998 and another invited talk "Every Time I Fire a Statistician, I Get a Warm Fuzzy Feeling" at AMTA-2000.

Philipp Koehn is a Ph.D. candidate in Computer Science at the University of Southern California. He has written a number of articles on topics in statistical machine translation, including bilingual lexicon induction from monolingual corpora, word-level translation models, and translation with scarce resources. He has also worked at AT&T Laboratories on text-to-speech systems, and at WhizBang! Labs on text categorization.

T6: Annotation of Temporal and Event Expressions
James Pustejovsky, Brandeis University and Inderjeet Mani, MITRE
Tuesday May 27 afternoon

Humans live in a dynamic world, where actions bring about consequences, and the facts and properties associated with entities change over time. Without a robust ability to identify events in NL data and temporally situate them, the real 91aboutness92 of the information can be missed. In appreciation of this need, there has recently been a renewed interest in temporal and event-based reasoning for NLP, aimed at addressing challenges in areas such as information extraction, question-answering, and summarization.

This tutorial will begin with an overview of theoretical work on tense, aspect, and event structure in natural language, as well as the fundamentals of temporal reasoning. It will then go on to discuss the annotation of temporal and event expressions in corpora, including the TimeML specification language and other results from the ARDA/NRRC Workshop on Temporal and Event Recognition for Question Answering Systems (TERQAS). The tutorial will examine how to formally distinguish events and their temporal anchoring in documents, and will discuss algorithms for ordering events mentioned in a document relative to each other and for computing closure over an entire discourse of events.

Tutorial attendees can expect to learn about current methodologies and computational resources, the outstanding problems in the area, as well as obtain follow-up pointers to the research literature. Attendees should be familiar with information extraction and the notion of corpus annotation. The course should appeal to those with an interest in leveraging robust semantic analysis for tasks like question-answering, information extraction, and summarization.

James Pustejovsky is Professor of Computer Science at Brandeis University where he is Director of the Laboratory for Linguistics and Com putation. Pustejovsky conducts research in the areas of computational linguistics, lexical semantics, knowledge representation, bioinformatics, and informat ion retrieval and extraction. He was organizer and PI for the ARDA-sponsored research workshop that created the metadata markup language TimeML. He has participated in numerous DARPA and NSF efforts in Knowledge Extractio n and Natural Language Engineering, including the MUC and TIPSTER projects. His publications include numerous books on semantics and corpus processing.

Inderjeet Mani is a Senior Principal Scientist at the MITRE Corporation in McLean, Virginia, and an adjunct faculty in Computational Linguistics at Georgetown University. Mani92s research, funded by MITRE, NSF, DARPA, and others, includes information extraction, automatic summarization, and bioinformatics. Mani helped develop the TIMEX2 annotation scheme for representing aspects of the meaning of temporal expressions in natural languages under the DARPA TIDES research program. He has worked (with Georgetown University) to develop TIMEX2-annotated corpora and taggers for different languages, and has also (with Columbia University) investigated methods for ordering events in news. His publications include two books on automatic summarization.

T7: NLP R&D and Commercial Deployment
Mark Wasson, Lexis Nexis
Tuesday May 27 afternoon

Over the past ten years, researchers in computational linguistics and information retrieval have made a number of advances in document retrieval, categorization, entity recognition and other areas. This drew the attention of venture capitalists who provided the money needed to commercialize this work. There have been some successes, but the landscape is littered with failed startups and applications that didn't live up to expectations. The value of NLP research is not based on its commercializability. But for those who seek to commercialize their research, good research alone is not enough.

The purpose of this tutorial will be to examine the role of NLP research from the perspective commercial deployment. Specifically, it will focus on issues and concerns that must be addressed to meet the needs of potential customers for NLP technology, customers who are eager for text processing and retrieval solutions, but who often are disappointed with what they find.

Tutorial Outline:

Academic and commercial perspectives on NLP research a. The role of academic research b. The role of R&D in industry c. Knowledge and technology transfer
What industry wants from NLP technology a. Information overload and the silver bullet b. Costs, productivity, competitive advantage, profit c. Why industry likes simple alternatives to NLP
Evaluating NLP a. Evalution considerations for commercial deployment b. TREC, DUC, MUC, ETC: Pros and cons c. What to measure, what to report
Test data a. Standard corpora aren't enough b. Using representative data of sufficient scale c. Where to get the data
Functionality a. What does the NLP component do b. Turning NLP functionality into product functionality c. Understanding the end user
Performance and scale considerations a. How fast is fast b. How large is large scale c. Why throwing hardware at it isn't a solution
Integration a. The production environment(s) b. Customizing the application c. Ongoing maintenance and support d. What 24x7 service really means
Selling NLP to industry a. Know your customer b. Know your technology c. Know your competition d. Showing how your technology benefits your customer e. How good salespeople go bad
What industry wants from NLP technology - specific R&D areas
Why industry should value and support academic NLP research, even that with no direct commercial value (and why we too often don't)
Concluding remarks

Mark Wasson has been a research scientist in computational linguistics with LexisNexis since joining the company in 1986. He has created and deployed a number of text processing technologies in categorization, indexing, document retrieval enhancements, entity extraction and summarization that have been applied to hundreds of millions of documents from more than 15,000 news, business, legal, financial and other sources. He has led research in information extraction (currently in development) and multidocument information aggregation (in production). He has developed and coordinated collaborative text processing R&D with other research teams in both academia and industry. In recent years, his job scope has also included 3rd party technology identification and evaluation. He has explored relevant technologies at more than 200 3rd party academic and commercial groups. This tutorial draws from these experiences as well as those of his colleagues at LexisNexis.

T8: Optimization, Maxent Models, and Conditional Estimation
without Magic
Christopher Manning and Dan Klein, Stanford University
Tuesday May 27 afternoon

This tutorial aims to cover the basic ideas and algorithms behind techniques such as maximum entropy modeling, conditional estimation of generative probabilistic models, and issues regarding the use of models more complex than simple Naive Bayes and Hidden Markov Models. In recent years, these sophisticated probabilistic methods have been used with considerable success on most of the core tasks of natural language processing, for speech language models, and for Information Retrieval tasks such as text filtering and categorization, but the methods and their relationships are often not well understood by practitioners. Our focus is on insight and understanding, using graphical illustrations rather than detailed derivations whenever possible. The goal of the tutorial is that the inner workings of these modeling and estimation techniques be transparent and intuitive, rather than black boxes labeled "magic here".

The tutorial decomposes these methods into optimization problems on the one side, and optimization methods on the other. The first hour of the tutorial presents the basics of non-linear optimization, assuming only knowledge of basic calculus. We begin with a discussion of convexity and unconstrained optimization, focusing on gradient methods. We discuss in detail both simple gradient descent and the much more practical conjugate gradient descent. The key ideas are presented, including a comparison/contrast with alternative methods. Next, the case of constrained optimization is presented, highlighting the method of Lagrange multipliers and presenting several ways of translating the abstract ideas into a concrete optimization method. The principle goal, again, is to make Lagrange methods appear as intuitively natural, rather than as mathematical sleight-of-hand.

The second part of the tutorial begins with a presentation of maximum entropy models from first principles, showing their equivalence to exponential models (also known as loglinear models, and particular versions of which give logistic regression, and conditional random fields). We present many simple examples to build intuition for what maxent models can and cannot do. Finally, we discuss how to find parameters for maximum entropy models using the previously presented optimization methods. We also discuss methods of smoothing, focusing on how smoothing works differently for maxent models than for standard relative-frequency-based distributions. By this point in the tutorial, the audience members should have a clear understanding of how to build a system for estimating maxent models. We conclude with a discussion of NLP-oriented issues in modeling, including conditional estimation of generative models, and the issues involved in choosing model structure (such as independence, label and observation biases, and so on).

The tutorial will run 3 hours, with a break in the middle. Participants will be assumed to know basic calculus and basic probability theory, and to have some exposure to models such as Naive Bayes and HMMs.

Chris Manning works on systems and formalisms that can intelligently process and produce human languages. His research concentrates on probabilistic models of language and statistical natural language processing, information extraction, text understanding and text mining, constraint-based theories of grammar (HPSG and LFG) and probabilistic extensions of them, syntactic typology, computational lexicography (involving work in XML, XSL, and information visualization), and other topics in computational linguistics and machine learning.

Dan Klein's research interests include the unsupervised learning of language (structure and grammar induction), machine learning for NLP (conditional models and estimation, the iteraction of smoothing and model estimation), designing efficient algorithms for NLP (A* methods for exact inference, factoring weakly coupled models, and parser design), and applications such as statistical parsing, and data clustering.

T9: Automatic Speaker and Language Recognition
Doug Reynolds and Marc Zissman, MIT Lincoln Lab
Tuesday May 27 afternoon

The speech signal conveys several levels of information beyond the words, such as information about the identity of the speaker and the language being spoken. These other levels are often very useful for augmenting the word transcripts to allow indexing and searching of audio archives. In this tutorial we will provide an overview of state-of-the-art techniques for extracting, modeling and evaluating speaker and language information from the speech signal. We will provide an overview of the area with some historical context and describe different application that use speaker and language recognition technology. For both speaker and language recognition technology, we will discuss the theory and the practice of how these systems are designed, trained, and evaluated, from the extraction of features from the speech signal, to corpora used to gage performance. A summary of expected performance in recent NIST evaluations will show expected performance levels. We will also provide an idea of the open issues in both areas and future research directions.

Tutorial Outline:

Introduction/Background
- History
- Definitions
- Applications
Speaker Recognition
- Approaches using acoustic information; Speech features conveying speaker information
- Approaches using other levels of information; Prosodics; Idiolect; Phonotactics; Pronunciations;
- System fusion
- Computation issues
- Performance of speaker recognition techniques
Language Recognition
- Approaches using phonotactics; Using phone strings and statistical language models
- Approaches using acoustic information; Speech features conveying language information
- System fusion
- Computation issues
- Performance of language recognition techniques
Conclusions and future directions

Douglas Reynolds is a Senior Member of Technical Staff in the Information and Systems Technology Group at MIT Lincoln Laboratory. He has worked in the area of automatic speaker recognition since 1992 both for government and commercial applications. His thesis work introduced the use Gaussian Mixture Models for text-independent speaker recognition and he has since invented and developed several widely used techniques in the area of speaker recognition, such as robust modeling with GMMs, application of a universal background model to text-independent recognition tasks, the use of Bayesian adaptation to train and update speaker models, fast scoring techniques for GMM based systems, the development and use of a handset/channel-type detector, and several normalization techniques based on the handset/channel-type detector. These and other ideas have been implemented in the Lincoln speaker recognition system which has won several annual international speaker recognition evaluations conducted by the National Institute of Standards and Technology(NIST). He was the team leader for the 2002 JHU Summer Workshop SuperSID project which focused on applying high-level information to speaker recognition tasks (http://www.clsp.jhu.edu/ws2002/groups/supersid/). He has also worked in the area of Language recognition, helping to apply GMM based system to the 2003 LID evaluation.

Marc Zissman is the Associate Group Leader of the Information Systems Technology Group at MIT Lincoln Laboratory. Marc's research has focused on digital speech processing, including parallel computing for speech coding and recognition, co-channel talker interference suppression, language and dialect identification, and cochlear-implant processing for the profoundly deaf. He has developed several different approaches to language ID that have exhibited state of the art performance in a sequence of annual government-sponsored evaluations. The Parallel-Phone-Recognition-Language-Model (PPRLM) system for LID has become one of the standard approach for automatic language recognition.

Top	Contact Webmaster