IJCNLP2013

Tutorials

T1) Open-domain conversations with Humanoid Robots (9:30〜12:30)
T3) Text Simplification with a Purpose (13:30〜16:30)
T4) Dependency Parsing: Past, Present and Future (13:30〜16:30)

"T2) String data structure for NLP: theory and practice" was canceled due to health reason of the lecturer.

Tutorials

Open-domain Conversations with Humanoid Robots

Dr. Graham Wilcock and Kristina Jokinen

The tutorial deals with the possibilities and challenges to make interaction with a humanoid robot (a) more interesting, by exploiting Wikipedia articles as a source of open-domain world knowledge, and (b) more natural, by integrating this with multimodal dialogue technologies for situated agents.

Natural language is used to exchange information, and the transfer of information is often taken as the main criterion for the success of interaction. However, one of the challenges for interactive systems is related to social aspects of interaction: how to engage the partner in the interaction and keep their interest up, how to show rapport so as to create a mutual bond and an understanding relationship.

In communication which is not directly oriented towards such tasks as making a reservation, the success of the interaction is based not on completion of a task but rather on the partner enjoyment and satisfying their quest for new information. One important aspect is that the topics that the speakers converse about may be more interesting or less interesting to the partners.

The focus is on interaction strategies that can follow the human changing interests and keep the flow of information going. Two key concepts are Topic and New Information. The tutorial will survey mechanisms for topic management in conversations, and discuss how speakers react to the presentation of new information and construct a shared context for exchanging information about interesting topics. Using Wikipedia as a source of world knowledge allows the conversation to move to any topic that the human is interested in. A great advantage of Wikipedia is that it is structured into clear topics, and the hyperlinks in the texts can be used to manage topic-switching moves from the current topic to a new topic.

The tutorial links text processing and speech processing with new areas. It is meant for students and researchers who are interested in combining NLP (processing Wikipedia texts) with speech (recognition and synthesis), and who are willing to combine these technologies with robotics (face recognition, head nodding, hand-and-arm gestures). Interaction with robots provides new applications and challenges for human language technologies, and pushes the area forward.

Tutorial Outline

Section 1: Introduction to situated natural language interaction (Jokinen)
- Review of the state-of-the-art technology and discourse theories
- Basic concepts and techniques (topic, new information, discourse relations, coherence, conversation)

Section 2: Towards open-domain conversations (Wilcock)
- Internet-based resources: Wikipedia
- Topic and new information in Wikipedia, topic chains, topic shifts

Section 3: Practical issues in human-robot interaction (Wilcock)
- Topic and new information in speech recognition
- Detecting if the human is interested in the topic (or not)

Section 4: Future challenges (Jokinen)
- Future views of combining language technology and robot communication
- Questions and discussion

▲TOP

Text Simplification with a Purpose

Dr. Horacio Saggion

Automatic text simplification as an NLP task arose from the necessity to make electronic textual content equally accessible to everyone. Automatic text simplification is a complex task which encompasses a number of operations applied to a text at different linguistic levels. The aim is to turn a complex text into a simplified variant, taking into consideration the specific needs of a particular target user. Automatic text simplification has traditionally had a double purpose. It can serve as preprocessing tool for other NLP applications and it can be used for a social function, making content accessible to different users such as foreign language learners, readers with aphasia, low literacy individuals, etc. The first attempts to text simplification were rule-based syntactic simplification systems however nowadays with the availability of large parallel corpora, such as the Original and the Simple English Wikipedia, approaches to automatic text simplification have become more data-driven. Text simplification is a very active research topic where progress is still needed. This tutorial will provide the audience with a panorama of more than a decade of work in the area emphasizing also the relevant social function that content simplification can make to the information society. The tutorial will be combined with a series of exercises and demonstration of existing technologies.

Tutorial Outline

1 Introduction
1.1 What is text simplification?
1.2 Challenges and Opportunities

2. Problems in text simplification
2.1 Syntactic Simplification
2.2 Lexical Simplification

3 Measuring readability/simplicity/complexity

4 Rule-based Simplification Systems
4.1 Manual rules
4.2 Learning Rules

5. Data-driven approaches: Machine Translation, Corpus-bases approaches
5.1 Learning simplification operations from aligned sentences
5.2 Pure machine translation approached

6 Lexical Simplification
6.1 Lexical simplification task
6.2 Word Sense Disambiguation

7 Systems, Tools, and Projects
7.1 FIRST Project, Simplext Project, PorSimples, FACILITA, etc.
7.2 LexSiS, DysWebxia, etc.

8 Evaluation
8.1 Extrinsic Evaluation (User evaluation, reading-and-comprehension)
8.2 Intrinsic Evaluation (readability, simplicity, etc.)

9 Closing

▲TOP

Dependency Parsing: Past, Present, and Future

Dr. Zhenghua Li, Dr. Wenliang Chen and Dr. Min Zhang

Dependency parsing has gained more and more interest in natural language processing in recent years due to its simplicity and general applicability for diverse languages. The international conference of computational natural language learning (CoNLL) has organized shared tasks on multilingual dependency parsing successively from 2006 to 2009, which leads to extensive progress on dependency parsing in both theoretical and practical perspectives. Meanwhile, dependency parsing has been successfully applied to machine translation, question answering, text mining, etc.

To date, research on dependency parsing mainly focuses on data-driven supervised approaches and results show that the supervised models can achieve satisfactory performance on in-domain texts for a variety of languages when large-scale manually labeled data is provided. In contrast, relatively less effort is devoted to parsing out-domain texts and resource-poor languages, and few successful techniques are bought up for such scenario. This tutorial will cover the past, present, and future of dependency parsing and is composed of four major parts. Especially, we will survey the present progress of semi-supervised dependency parsing techniques and discuss some directions for future work.

In the first part, we will introduce the fundamentals and supervised approaches for dependency parsing. The fundamentals include examples of dependency trees, annotated treebanks, evaluation metrics, and comparisons with other syntactic formulations like constituent parsing. Then we introduce a few mainstream supervised approaches, i.e., transition-based, graph-based, easy-first, constituent-based dependency parsing. These approaches solve dependency parsing from different perspectives, but achieve comparable and state-of-the-art performance for a wide range of languages. Then we move to the hybrid models that combine the advantages of the above approaches. We will also introduce recent work on efficient parsing techniques, joint lexical analysis and dependency parsing, multiple treebank exploitation, etc.

In the second part, we will survey the work on semi-supervised dependency parsing techniques for in-domain evaluations. Such work aims to explore unlabeled data so that the parser can achieve higher performance on in-domain texts. This tutorial will present several successful techniques that utilize information from different levels: whole tree level, partial tree level, and word level. We will discuss the advantages and limitations of the existing techniques.

In the third part, we will survey the work on semi-supervised dependency parsing techniques for out-domain texts and resource-poor languages. To promote research on out-domain parsing, researchers have organized two shared tasks, i.e., the CoNLL 2007 shared task and the shared task of syntactic analysis of non-canonical languages (SANCL 2012). Both two shared tasks attracted many participants. These participants tried different techniques to adapt the parser trained on WSJ texts to out-domain texts with the help of large-scale unlabeled data. In another track, researchers recently try to improve parsing performance for resource-poor languages by exploiting multilingual-aligned data and achieve higher accuracy than purely unsupervised methods.

In the fourth part, we will conclude our talk by discussing some directions for future work.

Tutorial outline

Part A: Dependency parsing and supervised approaches
-A.1 Introduction to dependency parsing
-A.2 Supervised methods
-A.3 Non-projective dependency parsing
-A.4 Probabilistic and generative models for dependency parsing
-A.5 Other recent work

Part B: Semi-supervised dependency parsing for in-domain text
-B.1 Whole tree level
-B.2 Partial tree level
-B.3 Word level
-B.4 Bilingual text parsing

Part C: Semi-supervised methods for out-domain text and resource-poor languages
-C.1 CoNLL 2007 shared task (domain adaptation subtask)
-C.2 SANCL 2012 (parsing the web)
-C.3 Multilingual transfer learning for resource-poor languages
-C.4 Other work for other grammars

Part D: Conclusion and open problems

▲TOP